INN Hotels Project¶
Context¶
A significant number of hotel bookings are called-off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.
The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.
The cancellation of bookings impact a hotel on various fronts:
- Loss of resources (revenue) when the hotel cannot resell the room.
- Additional costs of distribution channels by increasing commissions or paying for publicity to help sell these rooms.
- Lowering prices last minute, so the hotel can resell a room, resulting in reducing the profit margin.
- Human resources to make arrangements for the guests.
Objective¶
The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking is likely to be canceled. INN Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of booking cancellations and have reached out to your firm for data-driven solutions. You as a data scientist have to analyze the data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict which booking is going to be canceled in advance, and help in formulating profitable policies for cancellations and refunds.
Data Description¶
The data contains the different attributes of customers' booking details. The detailed data dictionary is given below.
Data Dictionary
- Booking_ID: unique identifier of each booking
- no_of_adults: Number of adults
- no_of_children: Number of Children
- no_of_weekend_nights: Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel
- no_of_week_nights: Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel
- type_of_meal_plan: Type of meal plan booked by the customer:
- Not Selected – No meal plan selected
- Meal Plan 1 – Breakfast
- Meal Plan 2 – Half board (breakfast and one other meal)
- Meal Plan 3 – Full board (breakfast, lunch, and dinner)
- required_car_parking_space: Does the customer require a car parking space? (0 - No, 1- Yes)
- room_type_reserved: Type of room reserved by the customer. The values are ciphered (encoded) by INN Hotels.
- lead_time: Number of days between the date of booking and the arrival date
- arrival_year: Year of arrival date
- arrival_month: Month of arrival date
- arrival_date: Date of the month
- market_segment_type: Market segment designation.
- repeated_guest: Is the customer a repeated guest? (0 - No, 1- Yes)
- no_of_previous_cancellations: Number of previous bookings that were canceled by the customer prior to the current booking
- no_of_previous_bookings_not_canceled: Number of previous bookings not canceled by the customer prior to the current booking
- avg_price_per_room: Average price per day of the reservation; prices of the rooms are dynamic. (in euros)
- no_of_special_requests: Total number of special requests made by the customer (e.g. high floor, view from the room, etc)
- booking_status: Flag indicating if the booking was canceled or not.
Importing necessary libraries and data¶
# Installing the libraries with the specified version.
!pip install pandas==1.5.3 numpy==1.25.2 matplotlib==3.7.1 seaborn==0.13.1 scikit-learn==1.2.2 statsmodels==0.14.1 -q --user
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.1/12.1 MB 44.6 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 37.3 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.6/9.6 MB 43.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.8/10.8 MB 41.5 MB/s eta 0:00:00 WARNING: The scripts f2py, f2py3 and f2py3.10 are installed in '/root/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. cudf-cu12 24.4.1 requires pandas<2.2.2dev0,>=2.0, but you have pandas 1.5.3 which is incompatible. google-colab 1.0.0 requires pandas==2.1.4, but you have pandas 1.5.3 which is incompatible. mizani 0.11.4 requires pandas>=2.1.0, but you have pandas 1.5.3 which is incompatible. pandas-stubs 2.1.4.231227 requires numpy>=1.26.0; python_version < "3.13", but you have numpy 1.25.2 which is incompatible. plotnine 0.13.6 requires pandas<3.0.0,>=2.1.0, but you have pandas 1.5.3 which is incompatible. xarray 2024.9.0 requires pandas>=2.1, but you have pandas 1.5.3 which is incompatible.
Note: After running the above cell, kindly restart the notebook kernel and run all cells sequentially from the start again.
# Mounted at /content/drive for Google Colab to get required data from drive
from google.colab import drive
drive.mount("/content/drive", force_remount=True)
Mounted at /content/drive
# import libraries for data manipulation
import numpy as np
import pandas as pd
# import libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# import library for probability distribution and statistic data
import scipy.stats as stats
# statistical libraries for Python
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
# prediction libraries for Python (Train/Test + Tree)
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
from sklearn.model_selection import GridSearchCV
from sklearn import metrics
from sklearn.metrics import (
accuracy_score,
recall_score,
confusion_matrix,
make_scorer,
roc_auc_score,
roc_curve,
)
# disable display column & row limits
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 250)
# ignore warnings
import warnings
warnings.filterwarnings("ignore")
# read the data from dataset using panda
df = pd.read_csv("/content/drive/MyDrive/ColabNotebooks/INNHotelsGroup.csv")
data = df.copy()
Data Overview¶
- Observations
- Sanity checks
# view first five rows of data
df.head()
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | INN00001 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled |
| 1 | INN00002 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | Not_Canceled |
| 2 | INN00003 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | Canceled |
| 3 | INN00004 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | Canceled |
| 4 | INN00005 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50 | 0 | Canceled |
# view last five rows od data
df.tail()
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 36270 | INN36271 | 3 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 4 | 85 | 2018 | 8 | 3 | Online | 0 | 0 | 0 | 167.80 | 1 | Not_Canceled |
| 36271 | INN36272 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 228 | 2018 | 10 | 17 | Online | 0 | 0 | 0 | 90.95 | 2 | Canceled |
| 36272 | INN36273 | 2 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 1 | 148 | 2018 | 7 | 1 | Online | 0 | 0 | 0 | 98.39 | 2 | Not_Canceled |
| 36273 | INN36274 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 63 | 2018 | 4 | 21 | Online | 0 | 0 | 0 | 94.50 | 0 | Canceled |
| 36274 | INN36275 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 207 | 2018 | 12 | 30 | Offline | 0 | 0 | 0 | 161.67 | 0 | Not_Canceled |
# I get same random results every time
np.random.seed(1)
df.sample(n=50)
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 30392 | INN30393 | 1 | 0 | 1 | 0 | Not Selected | 0 | Room_Type 1 | 53 | 2018 | 9 | 11 | Online | 0 | 0 | 0 | 94.32 | 0 | Not_Canceled |
| 6685 | INN06686 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 63 | 2018 | 4 | 22 | Online | 0 | 0 | 0 | 105.30 | 1 | Canceled |
| 8369 | INN08370 | 2 | 0 | 2 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 55 | 2018 | 9 | 11 | Online | 0 | 0 | 0 | 106.24 | 0 | Not_Canceled |
| 2055 | INN02056 | 2 | 0 | 0 | 2 | Not Selected | 0 | Room_Type 1 | 53 | 2017 | 12 | 29 | Online | 0 | 0 | 0 | 81.00 | 1 | Not_Canceled |
| 10969 | INN10970 | 1 | 0 | 2 | 4 | Meal Plan 1 | 0 | Room_Type 1 | 245 | 2018 | 7 | 6 | Offline | 0 | 0 | 0 | 110.00 | 0 | Canceled |
| 24881 | INN24882 | 2 | 0 | 3 | 7 | Meal Plan 1 | 0 | Room_Type 2 | 231 | 2018 | 8 | 1 | Online | 0 | 0 | 0 | 81.82 | 2 | Canceled |
| 28658 | INN28659 | 2 | 0 | 0 | 3 | Meal Plan 2 | 0 | Room_Type 1 | 71 | 2018 | 5 | 10 | Offline | 0 | 0 | 0 | 126.00 | 1 | Not_Canceled |
| 20853 | INN20854 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 66 | 2017 | 10 | 9 | Offline | 0 | 0 | 0 | 75.00 | 0 | Canceled |
| 8501 | INN08502 | 2 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 2 | 40 | 2018 | 1 | 14 | Online | 0 | 0 | 0 | 77.55 | 1 | Not_Canceled |
| 1942 | INN01943 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 63 | 2018 | 8 | 9 | Online | 0 | 0 | 0 | 144.90 | 2 | Not_Canceled |
| 15648 | INN15649 | 2 | 0 | 2 | 4 | Meal Plan 1 | 0 | Room_Type 1 | 209 | 2018 | 7 | 2 | Online | 0 | 0 | 0 | 66.53 | 1 | Not_Canceled |
| 6116 | INN06117 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 9 | 2018 | 7 | 6 | Online | 0 | 0 | 0 | 139.00 | 1 | Not_Canceled |
| 7868 | INN07869 | 2 | 0 | 2 | 4 | Meal Plan 1 | 0 | Room_Type 4 | 123 | 2018 | 5 | 22 | Online | 0 | 0 | 0 | 114.75 | 1 | Canceled |
| 24527 | INN24528 | 2 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 118 | 2018 | 6 | 28 | Online | 0 | 0 | 0 | 96.30 | 0 | Canceled |
| 24227 | INN24228 | 2 | 0 | 1 | 0 | Meal Plan 1 | 0 | Room_Type 4 | 72 | 2018 | 10 | 9 | Online | 0 | 0 | 0 | 132.30 | 3 | Not_Canceled |
| 17216 | INN17217 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 11 | 23 | Online | 0 | 0 | 0 | 120.00 | 0 | Not_Canceled |
| 31124 | INN31125 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 7 | 2017 | 8 | 28 | Corporate | 1 | 1 | 2 | 65.00 | 0 | Not_Canceled |
| 9101 | INN09102 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 12 | 2018 | 10 | 2 | Online | 0 | 0 | 0 | 6.00 | 0 | Not_Canceled |
| 9474 | INN09475 | 2 | 0 | 0 | 2 | Meal Plan 2 | 0 | Room_Type 1 | 63 | 2017 | 9 | 4 | Offline | 0 | 0 | 0 | 116.00 | 0 | Not_Canceled |
| 12782 | INN12783 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 18 | 2018 | 1 | 21 | Online | 0 | 0 | 0 | 77.00 | 1 | Not_Canceled |
| 28297 | INN28298 | 1 | 0 | 2 | 5 | Meal Plan 1 | 0 | Room_Type 1 | 68 | 2018 | 8 | 29 | Online | 0 | 0 | 0 | 92.35 | 1 | Not_Canceled |
| 22021 | INN22022 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 42 | 2018 | 11 | 4 | Offline | 0 | 0 | 0 | 72.00 | 0 | Not_Canceled |
| 26586 | INN26587 | 3 | 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 4 | 53 | 2018 | 3 | 20 | Online | 0 | 0 | 0 | 124.10 | 1 | Canceled |
| 16756 | INN16757 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 305 | 2018 | 11 | 4 | Offline | 0 | 0 | 0 | 89.00 | 0 | Canceled |
| 20927 | INN20928 | 2 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 1 | 106 | 2018 | 7 | 8 | Offline | 0 | 0 | 0 | 72.25 | 2 | Not_Canceled |
| 35753 | INN35754 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 128 | 2018 | 6 | 20 | Online | 0 | 0 | 0 | 120.00 | 0 | Canceled |
| 9359 | INN09360 | 1 | 0 | 1 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 7 | 2018 | 5 | 16 | Online | 0 | 0 | 0 | 97.00 | 1 | Not_Canceled |
| 21929 | INN21930 | 1 | 0 | 0 | 4 | Meal Plan 1 | 0 | Room_Type 4 | 48 | 2018 | 8 | 24 | Online | 0 | 0 | 0 | 149.40 | 1 | Not_Canceled |
| 17501 | INN17502 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 27 | 2018 | 3 | 22 | Online | 0 | 0 | 0 | 129.00 | 0 | Canceled |
| 3355 | INN03356 | 2 | 1 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 173 | 2018 | 8 | 13 | Online | 0 | 0 | 0 | 114.75 | 1 | Canceled |
| 22183 | INN22184 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 9 | 2018 | 6 | 1 | Online | 0 | 0 | 0 | 97.02 | 1 | Not_Canceled |
| 7818 | INN07819 | 2 | 0 | 0 | 1 | Not Selected | 0 | Room_Type 1 | 57 | 2018 | 12 | 1 | Online | 0 | 0 | 0 | 79.20 | 2 | Not_Canceled |
| 26360 | INN26361 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 245 | 2018 | 6 | 17 | Offline | 0 | 0 | 0 | 75.00 | 0 | Canceled |
| 15193 | INN15194 | 2 | 0 | 0 | 3 | Meal Plan 2 | 0 | Room_Type 1 | 36 | 2017 | 10 | 13 | Offline | 0 | 0 | 0 | 112.00 | 0 | Not_Canceled |
| 19873 | INN19874 | 2 | 0 | 0 | 1 | Not Selected | 0 | Room_Type 1 | 38 | 2018 | 7 | 2 | Online | 0 | 0 | 0 | 107.10 | 1 | Not_Canceled |
| 8015 | INN08016 | 1 | 1 | 0 | 1 | Meal Plan 2 | 0 | Room_Type 1 | 29 | 2018 | 12 | 17 | Online | 0 | 0 | 0 | 130.00 | 0 | Not_Canceled |
| 36151 | INN36152 | 2 | 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 26 | 2018 | 2 | 7 | Online | 0 | 0 | 0 | 64.64 | 0 | Not_Canceled |
| 21254 | INN21255 | 1 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 6 | 2018 | 5 | 23 | Online | 0 | 0 | 0 | 97.02 | 1 | Not_Canceled |
| 16881 | INN16882 | 2 | 0 | 1 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 258 | 2018 | 10 | 16 | Offline | 0 | 0 | 0 | 110.00 | 0 | Canceled |
| 9465 | INN09466 | 2 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 4 | 118 | 2018 | 10 | 29 | Online | 0 | 0 | 0 | 104.40 | 1 | Not_Canceled |
| 18785 | INN18786 | 1 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 188 | 2018 | 6 | 15 | Offline | 0 | 0 | 0 | 130.00 | 0 | Canceled |
| 28097 | INN28098 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 39 | 2018 | 3 | 14 | Offline | 0 | 0 | 0 | 85.00 | 0 | Not_Canceled |
| 29239 | INN29240 | 1 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 32 | 2017 | 11 | 20 | Offline | 0 | 0 | 0 | 73.00 | 0 | Not_Canceled |
| 664 | INN00665 | 2 | 1 | 1 | 1 | Meal Plan 1 | 1 | Room_Type 1 | 27 | 2018 | 8 | 8 | Online | 0 | 0 | 0 | 195.50 | 2 | Not_Canceled |
| 8285 | INN08286 | 2 | 0 | 2 | 1 | Not Selected | 0 | Room_Type 1 | 240 | 2018 | 12 | 10 | Online | 0 | 0 | 0 | 67.50 | 2 | Canceled |
| 4598 | INN04599 | 2 | 0 | 1 | 0 | Not Selected | 0 | Room_Type 1 | 127 | 2018 | 7 | 25 | Online | 0 | 0 | 0 | 94.50 | 1 | Not_Canceled |
| 21340 | INN21341 | 2 | 2 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 6 | 24 | 2018 | 4 | 14 | Online | 0 | 0 | 0 | 207.00 | 2 | Not_Canceled |
| 12098 | INN12099 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 262 | 2018 | 12 | 26 | Online | 0 | 0 | 0 | 73.95 | 0 | Canceled |
| 2904 | INN02905 | 2 | 0 | 0 | 5 | Meal Plan 1 | 0 | Room_Type 4 | 41 | 2018 | 10 | 4 | Online | 0 | 0 | 0 | 139.50 | 2 | Not_Canceled |
| 21902 | INN21903 | 2 | 0 | 2 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 101 | 2018 | 5 | 5 | Online | 0 | 0 | 0 | 132.60 | 1 | Not_Canceled |
# view the size of the data set
df.shape
(36275, 19)
# view the data types of a data set
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Booking_ID 36275 non-null object 1 no_of_adults 36275 non-null int64 2 no_of_children 36275 non-null int64 3 no_of_weekend_nights 36275 non-null int64 4 no_of_week_nights 36275 non-null int64 5 type_of_meal_plan 36275 non-null object 6 required_car_parking_space 36275 non-null int64 7 room_type_reserved 36275 non-null object 8 lead_time 36275 non-null int64 9 arrival_year 36275 non-null int64 10 arrival_month 36275 non-null int64 11 arrival_date 36275 non-null int64 12 market_segment_type 36275 non-null object 13 repeated_guest 36275 non-null int64 14 no_of_previous_cancellations 36275 non-null int64 15 no_of_previous_bookings_not_canceled 36275 non-null int64 16 avg_price_per_room 36275 non-null float64 17 no_of_special_requests 36275 non-null int64 18 booking_status 36275 non-null object dtypes: float64(1), int64(13), object(5) memory usage: 5.3+ MB
# check for duplicate values
df[df.duplicated()].count()
| 0 | |
|---|---|
| Booking_ID | 0 |
| no_of_adults | 0 |
| no_of_children | 0 |
| no_of_weekend_nights | 0 |
| no_of_week_nights | 0 |
| type_of_meal_plan | 0 |
| required_car_parking_space | 0 |
| room_type_reserved | 0 |
| lead_time | 0 |
| arrival_year | 0 |
| arrival_month | 0 |
| arrival_date | 0 |
| market_segment_type | 0 |
| repeated_guest | 0 |
| no_of_previous_cancellations | 0 |
| no_of_previous_bookings_not_canceled | 0 |
| avg_price_per_room | 0 |
| no_of_special_requests | 0 |
| booking_status | 0 |
# checking for null values data is complete
df.isnull().sum()
| 0 | |
|---|---|
| Booking_ID | 0 |
| no_of_adults | 0 |
| no_of_children | 0 |
| no_of_weekend_nights | 0 |
| no_of_week_nights | 0 |
| type_of_meal_plan | 0 |
| required_car_parking_space | 0 |
| room_type_reserved | 0 |
| lead_time | 0 |
| arrival_year | 0 |
| arrival_month | 0 |
| arrival_date | 0 |
| market_segment_type | 0 |
| repeated_guest | 0 |
| no_of_previous_cancellations | 0 |
| no_of_previous_bookings_not_canceled | 0 |
| avg_price_per_room | 0 |
| no_of_special_requests | 0 |
| booking_status | 0 |
# To drop booking_id
data = data.drop('Booking_ID', axis=1)
# view what are the values in object data type
cat_columns =['type_of_meal_plan', 'room_type_reserved', 'market_segment_type', 'booking_status']
for i in cat_columns:
print(data[i].value_counts())
print("*" * 50)
type_of_meal_plan Meal Plan 1 27835 Not Selected 5130 Meal Plan 2 3305 Meal Plan 3 5 Name: count, dtype: int64 ************************************************** room_type_reserved Room_Type 1 28130 Room_Type 4 6057 Room_Type 6 966 Room_Type 2 692 Room_Type 5 265 Room_Type 7 158 Room_Type 3 7 Name: count, dtype: int64 ************************************************** market_segment_type Online 23214 Offline 10528 Corporate 2017 Complementary 391 Aviation 125 Name: count, dtype: int64 ************************************************** booking_status Not_Canceled 24390 Canceled 11885 Name: count, dtype: int64 **************************************************
data.describe()
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | required_car_parking_space | lead_time | arrival_year | arrival_month | arrival_date | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 |
| mean | 1.844962 | 0.105279 | 0.810724 | 2.204300 | 0.030986 | 85.232557 | 2017.820427 | 7.423653 | 15.596995 | 0.025637 | 0.023349 | 0.153411 | 103.423539 | 0.619655 |
| std | 0.518715 | 0.402648 | 0.870644 | 1.410905 | 0.173281 | 85.930817 | 0.383836 | 3.069894 | 8.740447 | 0.158053 | 0.368331 | 1.754171 | 35.089424 | 0.786236 |
| min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2017.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 2.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 17.000000 | 2018.000000 | 5.000000 | 8.000000 | 0.000000 | 0.000000 | 0.000000 | 80.300000 | 0.000000 |
| 50% | 2.000000 | 0.000000 | 1.000000 | 2.000000 | 0.000000 | 57.000000 | 2018.000000 | 8.000000 | 16.000000 | 0.000000 | 0.000000 | 0.000000 | 99.450000 | 0.000000 |
| 75% | 2.000000 | 0.000000 | 2.000000 | 3.000000 | 0.000000 | 126.000000 | 2018.000000 | 10.000000 | 23.000000 | 0.000000 | 0.000000 | 0.000000 | 120.000000 | 1.000000 |
| max | 4.000000 | 10.000000 | 7.000000 | 17.000000 | 1.000000 | 443.000000 | 2018.000000 | 12.000000 | 31.000000 | 1.000000 | 13.000000 | 58.000000 | 540.000000 | 5.000000 |
data.describe(include=['object'])
| type_of_meal_plan | room_type_reserved | market_segment_type | booking_status | |
|---|---|---|---|---|
| count | 36275 | 36275 | 36275 | 36275 |
| unique | 4 | 7 | 5 | 2 |
| top | Meal Plan 1 | Room_Type 1 | Online | Not_Canceled |
| freq | 27835 | 28130 | 23214 | 24390 |
Observations
5 columns are object data sets and the Booking_ID (dropped).The 'booking_status' set to True/False.
36275 entries(rows) and 19 (columns) with no missing or duplicated data. This is well managed data set.
Exploratory Data Analysis (EDA)¶
- EDA is an important part of any project involving data.
- It is important to investigate and understand the data better before building a model with it.
- A few questions have been mentioned below which will help you approach the analysis in the right manner and generate insights from the data.
- A thorough analysis of the data, in addition to the questions mentioned below, should be done.
Leading Questions:
- What are the busiest months in the hotel?
- Which market segment do most of the guests come from?
- Hotel rates are dynamic and change according to demand and customer demographics. What are the differences in room prices in different market segments?
- What percentage of bookings are canceled?
- Repeating guests are the guests who stay in the hotel often and are important to brand equity. What percentage of repeating guests cancel?
- Many guests have special requirements when booking a hotel room. Do these requirements affect booking cancellation?
# To create labeled barplots
def labeled_barplot(data, feature, perc=False, n=None):
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 2, 6))
else:
plt.figure(figsize=(n + 2, 6))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n].sort_values(),
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage
plt.show() # show the plot
# To plot a boxplot and a histogram along the same scale.
def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None):
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2, # Number of rows of the subplot grid= 2
sharex=True, # x-axis will be shared among all subplots
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
) # creating the 2 subplots
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
) # boxplot will be created and a star will indicate the mean value of the column
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="green", linestyle="--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color="black", linestyle="-"
) # Add median to the histogram
labeled_barplot(data=data, feature="arrival_month", perc=True, n=None)
labeled_barplot(data, 'booking_status', perc=True, n=None)
labeled_barplot(data, 'market_segment_type', perc=True, n=None)
labeled_barplot(data, 'no_of_adults', perc=True)
labeled_barplot(df, 'no_of_children', perc=True)
labeled_barplot(df, 'no_of_weekend_nights', perc=True)
labeled_barplot(df,'no_of_week_nights', perc=True)
labeled_barplot(df, 'required_car_parking_space', perc=True)
labeled_barplot(df,'room_type_reserved', perc=True)
labeled_barplot(df,'type_of_meal_plan',perc=True)
labeled_barplot(df,'lead_time',perc=True)
histogram_boxplot(df,'no_of_previous_cancellations')
histogram_boxplot(df,'no_of_previous_bookings_not_canceled')
histogram_boxplot(data=df, feature="lead_time")
# To find Hotel free rooms
data[data['avg_price_per_room']==0]
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 63 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 2 | 2017 | 9 | 10 | Complementary | 0 | 0 | 0 | 0.0 | 1 | Not_Canceled |
| 145 | 1 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 13 | 2018 | 6 | 1 | Complementary | 1 | 3 | 5 | 0.0 | 1 | Not_Canceled |
| 209 | 1 | 0 | 0 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 4 | 2018 | 2 | 27 | Complementary | 0 | 0 | 0 | 0.0 | 1 | Not_Canceled |
| 266 | 1 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2017 | 8 | 12 | Complementary | 1 | 0 | 1 | 0.0 | 1 | Not_Canceled |
| 267 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 4 | 2017 | 8 | 23 | Complementary | 0 | 0 | 0 | 0.0 | 1 | Not_Canceled |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 35983 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 7 | 0 | 2018 | 6 | 7 | Complementary | 1 | 4 | 17 | 0.0 | 1 | Not_Canceled |
| 36080 | 1 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 7 | 0 | 2018 | 3 | 21 | Complementary | 1 | 3 | 15 | 0.0 | 1 | Not_Canceled |
| 36114 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 3 | 2 | Online | 0 | 0 | 0 | 0.0 | 0 | Not_Canceled |
| 36217 | 2 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 2 | 3 | 2017 | 8 | 9 | Online | 0 | 0 | 0 | 0.0 | 2 | Not_Canceled |
| 36250 | 1 | 0 | 0 | 2 | Meal Plan 2 | 0 | Room_Type 1 | 6 | 2017 | 12 | 10 | Online | 0 | 0 | 0 | 0.0 | 0 | Not_Canceled |
545 rows × 18 columns
df.loc[df['avg_price_per_room']==0, 'market_segment_type'].value_counts()
| count | |
|---|---|
| market_segment_type | |
| Complementary | 354 |
| Online | 191 |
sns.boxplot(data=df, y='avg_price_per_room' , x='market_segment_type')
<Axes: xlabel='market_segment_type', ylabel='avg_price_per_room'>
sns.catplot(data=df ,y='no_of_special_requests', hue='booking_status', kind='count')
<seaborn.axisgrid.FacetGrid at 0x798549905510>
plt.figure(figsize=(20,10))
sns.heatmap(df.select_dtypes(include=np.number).corr(), annot=True, vmin=-1, vmax=1, fmt='.2f')
<Axes: >
# hoe does lead time effect cancellation
sns.catplot(data=df, x='lead_time', hue='booking_status', kind='count')
plt.xticks(rotation=45)
plt.show()
# To find booking status count
sns.catplot(data=df, x='no_of_weekend_nights', hue='booking_status', kind='count')
plt.show()
sns.catplot(data=df, x='no_of_weekend_nights', hue='booking_status', kind='count')
plt.show()
# To find some of the months are busier than others
plt.figure(figsize=(10,5))
sns.lineplot(data=df, x='arrival_month', y='avg_price_per_room')
plt.show()
# new column for length to stay
data['length_stay'] = data['no_of_weekend_nights'] + data['no_of_week_nights']
sns.pairplot(df[['no_of_weekend_nights', 'no_of_week_nights', 'required_car_parking_space','lead_time', 'avg_price_per_room', 'no_of_special_requests',
'type_of_meal_plan','room_type_reserved','market_segment_type','booking_status', ]]);
Leading Questions Answered:
What are the busiest months in the hotel?
Answear: October with 14.7 % of total booking for the year.
Which market segment do most of the guests come from?
Answear: Online 23214 or 64% of the booking come via the internet.
Hotel rates are dynamic and change according to demand and customer demographics. What are the differences in room prices in different market segments?
Answer : Online booking are the highest despite also having the highest amountof free rooms. Aviation, offline and corporate are generally slightly lower priced with corporate edging out for the lowest.
What percentage of bookings are canceled?
Answer: Around 1/3(11885) of bookings are canceled in the given data.
Repeating guests are the guests who stay in the hotel often and are important to brand equity. What percentage of repeating guests cancel?
Answer: Reapting guest rarely canceled (1.75%).
Many guests have special requirements when booking a hotel room. Do these requirements affect booking cancellation?
Answer: Special request begins to reduced the cancellation at one and progressively reduce cancellation to zero.
Data Preprocessing¶
- Missing value treatment (if needed)
- Feature engineering (if needed)
- Outlier detection and treatment (if needed)
- Preparing data for modeling
- Any other preprocessing steps (if needed)
data.loc[data['booking_status']=='Not_Canceled', 'booking_status'] = False
data.loc[data['booking_status']=='Canceled', 'booking_status'] = True
# drop colum beacuse they were either time, or not helpful
numeric_columns = data.select_dtypes(include=np.number).columns.tolist()
numeric_columns.remove("arrival_year")
plt.figure(figsize=(15, 12))
for i, variable in enumerate(numeric_columns):
data.boxplot()
plt.xticks(rotation=90)
plt.show()
# Avg price room
Quartiles = np.quantile(data['avg_price_per_room'][data['avg_price_per_room'].notnull()], [.25,.75])
power_4iqr = 4 * (Quartiles[1] - Quartiles[0])
print(f'Q1 = {Quartiles[0]}, Q3 = {Quartiles[1]}, 4*IQR = {power_4iqr}')
outlier_powers = data.loc[np.abs(data['avg_price_per_room'] - data['avg_price_per_room'].median()) > power_4iqr, 'avg_price_per_room']
outlier_powers.shape
Q1 = 80.3, Q3 = 120.0, 4*IQR = 158.8
(49,)
# creating a list of columns
dist_cols = [
item for item in data.select_dtypes(include=np.number).columns
]
plt.figure(figsize=(15, 45))
# looping the list and ploting histograns
for i in range(len(dist_cols)):
plt.subplot(12,3,i+1)
plt.hist(data[dist_cols[i]],bins=50)
plt.tight_layout()
plt.title(dist_cols[i], fontsize=15)
plt.show()
data2 = data.copy()
# removing because they are close to normal
dist_cols.remove('no_of_week_nights')
dist_cols.remove('no_of_adults')
dist_cols.remove('length_stay')
dist_cols.remove('avg_price_per_room')
# removing becasue they are boolean or time related.
dist_cols.remove('arrival_year')
dist_cols.remove('required_car_parking_space')
dist_cols.remove('arrival_date')
dist_cols.remove('arrival_month')
dist_cols.remove('repeated_guest')
# removing lead_time
dist_cols.remove('lead_time')
# using log transforms on columns
for col in dist_cols:
data2[col + "_log"] = np.log1p(data2[col] +1)
# dropping the original columns
data2.drop(dist_cols, axis=1, inplace=True)
data2.head()
| no_of_adults | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | avg_price_per_room | booking_status | length_stay | no_of_children_log | no_of_weekend_nights_log | no_of_previous_cancellations_log | no_of_previous_bookings_not_canceled_log | no_of_special_requests_log | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 65.00 | False | 3 | 0.693147 | 1.098612 | 0.693147 | 0.693147 | 0.693147 |
| 1 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 106.68 | False | 5 | 0.693147 | 1.386294 | 0.693147 | 0.693147 | 1.098612 |
| 2 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 60.00 | True | 3 | 0.693147 | 1.386294 | 0.693147 | 0.693147 | 0.693147 |
| 3 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 100.00 | True | 2 | 0.693147 | 0.693147 | 0.693147 | 0.693147 | 0.693147 |
| 4 | 2 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 94.50 | True | 2 | 0.693147 | 1.098612 | 0.693147 | 0.693147 | 0.693147 |
# viewing the distributions after the log transformation
dist_cols = [
item for item in data2.select_dtypes(include=np.number).columns
]
#plot histogram of all numeric columns
plt.figure(figsize=(15, 45))
for i in range(len(dist_cols)):
plt.subplot(12, 3, i+1)
plt.hist(data2[dist_cols[i]],bins=50)
sns.histplot(data=data2, x=dist_cols[i], kde=True)
plt.tight_layout()
plt.title(dist_cols[i], fontsize=15)
plt.show()
# catergorical variables
dummy_data = pd.get_dummies (
data2,
columns = [
'type_of_meal_plan',
'room_type_reserved',
'market_segment_type',
],
drop_first=True,
)
dummy_data.head()
| no_of_adults | no_of_week_nights | required_car_parking_space | lead_time | arrival_year | arrival_month | arrival_date | repeated_guest | avg_price_per_room | booking_status | length_stay | no_of_children_log | no_of_weekend_nights_log | no_of_previous_cancellations_log | no_of_previous_bookings_not_canceled_log | no_of_special_requests_log | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Meal Plan 3 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 3 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Complementary | market_segment_type_Corporate | market_segment_type_Offline | market_segment_type_Online | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 2 | 0 | 224 | 2017 | 10 | 2 | 0 | 65.00 | False | 3 | 0.693147 | 1.098612 | 0.693147 | 0.693147 | 0.693147 | False | False | False | False | False | False | False | False | False | False | False | True | False |
| 1 | 2 | 3 | 0 | 5 | 2018 | 11 | 6 | 0 | 106.68 | False | 5 | 0.693147 | 1.386294 | 0.693147 | 0.693147 | 1.098612 | False | False | True | False | False | False | False | False | False | False | False | False | True |
| 2 | 1 | 1 | 0 | 1 | 2018 | 2 | 28 | 0 | 60.00 | True | 3 | 0.693147 | 1.386294 | 0.693147 | 0.693147 | 0.693147 | False | False | False | False | False | False | False | False | False | False | False | False | True |
| 3 | 2 | 2 | 0 | 211 | 2018 | 5 | 20 | 0 | 100.00 | True | 2 | 0.693147 | 0.693147 | 0.693147 | 0.693147 | 0.693147 | False | False | False | False | False | False | False | False | False | False | False | False | True |
| 4 | 2 | 1 | 0 | 48 | 2018 | 4 | 11 | 0 | 94.50 | True | 2 | 0.693147 | 1.098612 | 0.693147 | 0.693147 | 0.693147 | False | False | True | False | False | False | False | False | False | False | False | False | True |
dummy_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 29 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 36275 non-null int64 1 no_of_week_nights 36275 non-null int64 2 required_car_parking_space 36275 non-null int64 3 lead_time 36275 non-null int64 4 arrival_year 36275 non-null int64 5 arrival_month 36275 non-null int64 6 arrival_date 36275 non-null int64 7 repeated_guest 36275 non-null int64 8 avg_price_per_room 36275 non-null float64 9 booking_status 36275 non-null object 10 length_stay 36275 non-null int64 11 no_of_children_log 36275 non-null float64 12 no_of_weekend_nights_log 36275 non-null float64 13 no_of_previous_cancellations_log 36275 non-null float64 14 no_of_previous_bookings_not_canceled_log 36275 non-null float64 15 no_of_special_requests_log 36275 non-null float64 16 type_of_meal_plan_Meal Plan 2 36275 non-null bool 17 type_of_meal_plan_Meal Plan 3 36275 non-null bool 18 type_of_meal_plan_Not Selected 36275 non-null bool 19 room_type_reserved_Room_Type 2 36275 non-null bool 20 room_type_reserved_Room_Type 3 36275 non-null bool 21 room_type_reserved_Room_Type 4 36275 non-null bool 22 room_type_reserved_Room_Type 5 36275 non-null bool 23 room_type_reserved_Room_Type 6 36275 non-null bool 24 room_type_reserved_Room_Type 7 36275 non-null bool 25 market_segment_type_Complementary 36275 non-null bool 26 market_segment_type_Corporate 36275 non-null bool 27 market_segment_type_Offline 36275 non-null bool 28 market_segment_type_Online 36275 non-null bool dtypes: bool(13), float64(6), int64(9), object(1) memory usage: 4.9+ MB
dummied_cut = pd.cut(dummy_data['lead_time'], 5, labels=['lat_min','short','med','long','advanced'])
dummied_cut.head(10)
| lead_time | |
|---|---|
| 0 | med |
| 1 | lat_min |
| 2 | lat_min |
| 3 | med |
| 4 | lat_min |
| 5 | long |
| 6 | lat_min |
| 7 | lat_min |
| 8 | short |
| 9 | lat_min |
data3 = pd.merge(dummy_data, dummied_cut, left_index=True, right_index=True)
data3.head().T
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| no_of_adults | 2 | 2 | 1 | 2 | 2 |
| no_of_week_nights | 2 | 3 | 1 | 2 | 1 |
| required_car_parking_space | 0 | 0 | 0 | 0 | 0 |
| lead_time_x | 224 | 5 | 1 | 211 | 48 |
| arrival_year | 2017 | 2018 | 2018 | 2018 | 2018 |
| arrival_month | 10 | 11 | 2 | 5 | 4 |
| arrival_date | 2 | 6 | 28 | 20 | 11 |
| repeated_guest | 0 | 0 | 0 | 0 | 0 |
| avg_price_per_room | 65.0 | 106.68 | 60.0 | 100.0 | 94.5 |
| booking_status | False | False | True | True | True |
| length_stay | 3 | 5 | 3 | 2 | 2 |
| no_of_children_log | 0.693147 | 0.693147 | 0.693147 | 0.693147 | 0.693147 |
| no_of_weekend_nights_log | 1.098612 | 1.386294 | 1.386294 | 0.693147 | 1.098612 |
| no_of_previous_cancellations_log | 0.693147 | 0.693147 | 0.693147 | 0.693147 | 0.693147 |
| no_of_previous_bookings_not_canceled_log | 0.693147 | 0.693147 | 0.693147 | 0.693147 | 0.693147 |
| no_of_special_requests_log | 0.693147 | 1.098612 | 0.693147 | 0.693147 | 0.693147 |
| type_of_meal_plan_Meal Plan 2 | False | False | False | False | False |
| type_of_meal_plan_Meal Plan 3 | False | False | False | False | False |
| type_of_meal_plan_Not Selected | False | True | False | False | True |
| room_type_reserved_Room_Type 2 | False | False | False | False | False |
| room_type_reserved_Room_Type 3 | False | False | False | False | False |
| room_type_reserved_Room_Type 4 | False | False | False | False | False |
| room_type_reserved_Room_Type 5 | False | False | False | False | False |
| room_type_reserved_Room_Type 6 | False | False | False | False | False |
| room_type_reserved_Room_Type 7 | False | False | False | False | False |
| market_segment_type_Complementary | False | False | False | False | False |
| market_segment_type_Corporate | False | False | False | False | False |
| market_segment_type_Offline | True | False | False | False | False |
| market_segment_type_Online | False | True | True | True | True |
| lead_time_y | med | lat_min | lat_min | med | lat_min |
# dropping time variables.
data3_5 = data3.drop(['lead_time_x','arrival_date', 'arrival_year'], axis=1)
data4 = pd.get_dummies (
data3_5,
columns = [
'lead_time_y',
],
drop_first=True,
)
data4.head().T
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| no_of_adults | 2 | 2 | 1 | 2 | 2 |
| no_of_week_nights | 2 | 3 | 1 | 2 | 1 |
| required_car_parking_space | 0 | 0 | 0 | 0 | 0 |
| arrival_month | 10 | 11 | 2 | 5 | 4 |
| repeated_guest | 0 | 0 | 0 | 0 | 0 |
| avg_price_per_room | 65.0 | 106.68 | 60.0 | 100.0 | 94.5 |
| booking_status | False | False | True | True | True |
| length_stay | 3 | 5 | 3 | 2 | 2 |
| no_of_children_log | 0.693147 | 0.693147 | 0.693147 | 0.693147 | 0.693147 |
| no_of_weekend_nights_log | 1.098612 | 1.386294 | 1.386294 | 0.693147 | 1.098612 |
| no_of_previous_cancellations_log | 0.693147 | 0.693147 | 0.693147 | 0.693147 | 0.693147 |
| no_of_previous_bookings_not_canceled_log | 0.693147 | 0.693147 | 0.693147 | 0.693147 | 0.693147 |
| no_of_special_requests_log | 0.693147 | 1.098612 | 0.693147 | 0.693147 | 0.693147 |
| type_of_meal_plan_Meal Plan 2 | False | False | False | False | False |
| type_of_meal_plan_Meal Plan 3 | False | False | False | False | False |
| type_of_meal_plan_Not Selected | False | True | False | False | True |
| room_type_reserved_Room_Type 2 | False | False | False | False | False |
| room_type_reserved_Room_Type 3 | False | False | False | False | False |
| room_type_reserved_Room_Type 4 | False | False | False | False | False |
| room_type_reserved_Room_Type 5 | False | False | False | False | False |
| room_type_reserved_Room_Type 6 | False | False | False | False | False |
| room_type_reserved_Room_Type 7 | False | False | False | False | False |
| market_segment_type_Complementary | False | False | False | False | False |
| market_segment_type_Corporate | False | False | False | False | False |
| market_segment_type_Offline | True | False | False | False | False |
| market_segment_type_Online | False | True | True | True | True |
| lead_time_y_short | False | False | False | False | False |
| lead_time_y_med | True | False | False | True | False |
| lead_time_y_long | False | False | False | False | False |
| lead_time_y_advanced | False | False | False | False | False |
data4 = data4.astype(float)
data4.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 30 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 36275 non-null float64 1 no_of_week_nights 36275 non-null float64 2 required_car_parking_space 36275 non-null float64 3 arrival_month 36275 non-null float64 4 repeated_guest 36275 non-null float64 5 avg_price_per_room 36275 non-null float64 6 booking_status 36275 non-null float64 7 length_stay 36275 non-null float64 8 no_of_children_log 36275 non-null float64 9 no_of_weekend_nights_log 36275 non-null float64 10 no_of_previous_cancellations_log 36275 non-null float64 11 no_of_previous_bookings_not_canceled_log 36275 non-null float64 12 no_of_special_requests_log 36275 non-null float64 13 type_of_meal_plan_Meal Plan 2 36275 non-null float64 14 type_of_meal_plan_Meal Plan 3 36275 non-null float64 15 type_of_meal_plan_Not Selected 36275 non-null float64 16 room_type_reserved_Room_Type 2 36275 non-null float64 17 room_type_reserved_Room_Type 3 36275 non-null float64 18 room_type_reserved_Room_Type 4 36275 non-null float64 19 room_type_reserved_Room_Type 5 36275 non-null float64 20 room_type_reserved_Room_Type 6 36275 non-null float64 21 room_type_reserved_Room_Type 7 36275 non-null float64 22 market_segment_type_Complementary 36275 non-null float64 23 market_segment_type_Corporate 36275 non-null float64 24 market_segment_type_Offline 36275 non-null float64 25 market_segment_type_Online 36275 non-null float64 26 lead_time_y_short 36275 non-null float64 27 lead_time_y_med 36275 non-null float64 28 lead_time_y_long 36275 non-null float64 29 lead_time_y_advanced 36275 non-null float64 dtypes: float64(30) memory usage: 8.3 MB
# the train test data first.
X = data4.drop("booking_status" , axis=1)
y = data4.pop("booking_status")
# adding a contstant to X variable
X = add_constant(X)
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print("Number of rows in train data =", x_train.shape[0])
print("Number of rows in test data =", x_test.shape[0])
Number of rows in train data = 29020 Number of rows in test data = 7255
x_train.info()
<class 'pandas.core.frame.DataFrame'> Index: 29020 entries, 25629 to 15795 Data columns (total 30 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 const 29020 non-null float64 1 no_of_adults 29020 non-null float64 2 no_of_week_nights 29020 non-null float64 3 required_car_parking_space 29020 non-null float64 4 arrival_month 29020 non-null float64 5 repeated_guest 29020 non-null float64 6 avg_price_per_room 29020 non-null float64 7 length_stay 29020 non-null float64 8 no_of_children_log 29020 non-null float64 9 no_of_weekend_nights_log 29020 non-null float64 10 no_of_previous_cancellations_log 29020 non-null float64 11 no_of_previous_bookings_not_canceled_log 29020 non-null float64 12 no_of_special_requests_log 29020 non-null float64 13 type_of_meal_plan_Meal Plan 2 29020 non-null float64 14 type_of_meal_plan_Meal Plan 3 29020 non-null float64 15 type_of_meal_plan_Not Selected 29020 non-null float64 16 room_type_reserved_Room_Type 2 29020 non-null float64 17 room_type_reserved_Room_Type 3 29020 non-null float64 18 room_type_reserved_Room_Type 4 29020 non-null float64 19 room_type_reserved_Room_Type 5 29020 non-null float64 20 room_type_reserved_Room_Type 6 29020 non-null float64 21 room_type_reserved_Room_Type 7 29020 non-null float64 22 market_segment_type_Complementary 29020 non-null float64 23 market_segment_type_Corporate 29020 non-null float64 24 market_segment_type_Offline 29020 non-null float64 25 market_segment_type_Online 29020 non-null float64 26 lead_time_y_short 29020 non-null float64 27 lead_time_y_med 29020 non-null float64 28 lead_time_y_long 29020 non-null float64 29 lead_time_y_advanced 29020 non-null float64 dtypes: float64(30) memory usage: 6.9 MB
EDA¶
- It is a good idea to explore the data once again after manipulating it.
plt.figure(figsize=(20,10))
sns.heatmap(
data4.corr(), annot=True, vmin=-1, vmax=1, fmt='.2f')
<Axes: >
sns.pairplot(data4[['no_of_adults',
'required_car_parking_space',
'arrival_month',
'repeated_guest',
'avg_price_per_room',
'length_stay',
'no_of_children_log',
'no_of_previous_cancellations_log',
'no_of_previous_bookings_not_canceled_log',
'no_of_special_requests_log',
'market_segment_type_Complementary',
'market_segment_type_Corporate',
'market_segment_type_Offline',
'market_segment_type_Online',
'lead_time_y_short',
'lead_time_y_med',
'lead_time_y_long',
'lead_time_y_advanced']]);
EDA Insights
- Online booked rooms have the highest cost of booking.
- Late Summer is the busiest time of the year for the hotel chain.
- Nearly 2/3 of bookings come from online sources.
- Typically, rooms rent for around 100 plus or minus 25.
- Guest who make special request for their stay, are significantly less likely to cancel the reservation.
- 72% of adults who book rooms have another adult staying with them.
- Children are rare at the hotels and 92.6% of booking don’t include children in the rooms.
- 52.6% of bookings include at least one weekend night.
- Breakfast plan is the most popular..
Checking Multicollinearity¶
- In order to make statistical inferences from a logistic regression model, it is important to ensure that there is no multicollinearity present in the data.
# let's check the VIF of the predictor
vif_series = pd.Series(
[variance_inflation_factor(x_train.values, i) for i in range(x_train.shape[1])],
index=x_train.columns,
dtype=float,
)
print("VIF values: \n\n{}\n".format(vif_series))
VIF values: const 834.261073 no_of_adults 1.341576 no_of_week_nights 193.304696 required_car_parking_space 1.041068 arrival_month 1.051504 repeated_guest 2.847785 avg_price_per_room 1.934505 length_stay 287.553100 no_of_children_log 1.910321 no_of_weekend_nights_log 68.425417 no_of_previous_cancellations_log 1.535978 no_of_previous_bookings_not_canceled_log 2.993684 no_of_special_requests_log 1.265732 type_of_meal_plan_Meal Plan 2 1.212041 type_of_meal_plan_Meal Plan 3 1.014148 type_of_meal_plan_Not Selected 1.240739 room_type_reserved_Room_Type 2 1.087549 room_type_reserved_Room_Type 3 1.003615 room_type_reserved_Room_Type 4 1.359499 room_type_reserved_Room_Type 5 1.032541 room_type_reserved_Room_Type 6 1.893696 room_type_reserved_Room_Type 7 1.103061 market_segment_type_Complementary 4.540103 market_segment_type_Corporate 17.397158 market_segment_type_Offline 65.070216 market_segment_type_Online 72.414962 lead_time_y_short 1.119323 lead_time_y_med 1.106042 lead_time_y_long 1.144803 lead_time_y_advanced 1.046140 dtype: float64
# dropping the number of weekend & week night because i have combained them into one & market segments because they all have large multi value
x_train1 = x_train.drop(['no_of_weekend_nights_log',
'no_of_week_nights',
'market_segment_type_Online',
'market_segment_type_Offline',
'market_segment_type_Corporate',
'market_segment_type_Complementary'],
axis=1)
Building a Logistic Regression model¶
logit = sm.Logit(y_train, x_train1.astype(float)).fit()
logit.summary()
Optimization terminated successfully.
Current function value: 0.460614
Iterations 10
| Dep. Variable: | booking_status | No. Observations: | 29020 |
|---|---|---|---|
| Model: | Logit | Df Residuals: | 28996 |
| Method: | MLE | Df Model: | 23 |
| Date: | Fri, 27 Sep 2024 | Pseudo R-squ.: | 0.2706 |
| Time: | 20:50:39 | Log-Likelihood: | -13367. |
| converged: | True | LL-Null: | -18327. |
| Covariance Type: | nonrobust | LLR p-value: | 0.000 |
| coef | std err | z | P>|z| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| const | -2.7560 | 0.435 | -6.334 | 0.000 | -3.609 | -1.903 |
| no_of_adults | 0.2461 | 0.033 | 7.465 | 0.000 | 0.181 | 0.311 |
| required_car_parking_space | -1.4865 | 0.129 | -11.518 | 0.000 | -1.740 | -1.234 |
| arrival_month | -0.0660 | 0.005 | -12.259 | 0.000 | -0.076 | -0.055 |
| repeated_guest | -2.5765 | 0.532 | -4.842 | 0.000 | -3.619 | -1.534 |
| avg_price_per_room | 0.0228 | 0.001 | 36.055 | 0.000 | 0.022 | 0.024 |
| length_stay | 0.1161 | 0.009 | 13.436 | 0.000 | 0.099 | 0.133 |
| no_of_children_log | 0.8147 | 0.145 | 5.626 | 0.000 | 0.531 | 1.098 |
| no_of_previous_cancellations_log | 1.4853 | 0.497 | 2.988 | 0.003 | 0.511 | 2.460 |
| no_of_previous_bookings_not_canceled_log | -0.4811 | 0.499 | -0.965 | 0.335 | -1.458 | 0.496 |
| no_of_special_requests_log | -3.2166 | 0.068 | -47.388 | 0.000 | -3.350 | -3.084 |
| type_of_meal_plan_Meal Plan 2 | -0.3154 | 0.053 | -5.955 | 0.000 | -0.419 | -0.212 |
| type_of_meal_plan_Meal Plan 3 | 1.7119 | 2.879 | 0.595 | 0.552 | -3.931 | 7.355 |
| type_of_meal_plan_Not Selected | 0.8094 | 0.045 | 17.976 | 0.000 | 0.721 | 0.898 |
| room_type_reserved_Room_Type 2 | 0.0980 | 0.116 | 0.847 | 0.397 | -0.129 | 0.325 |
| room_type_reserved_Room_Type 3 | -0.4403 | 1.308 | -0.337 | 0.736 | -3.004 | 2.124 |
| room_type_reserved_Room_Type 4 | 0.0543 | 0.047 | 1.159 | 0.246 | -0.038 | 0.146 |
| room_type_reserved_Room_Type 5 | -0.8687 | 0.180 | -4.833 | 0.000 | -1.221 | -0.516 |
| room_type_reserved_Room_Type 6 | -1.0992 | 0.129 | -8.510 | 0.000 | -1.352 | -0.846 |
| room_type_reserved_Room_Type 7 | -1.7658 | 0.277 | -6.373 | 0.000 | -2.309 | -1.223 |
| lead_time_y_short | 1.3149 | 0.036 | 36.224 | 0.000 | 1.244 | 1.386 |
| lead_time_y_med | 2.8013 | 0.054 | 52.064 | 0.000 | 2.696 | 2.907 |
| lead_time_y_long | 3.0378 | 0.073 | 41.688 | 0.000 | 2.895 | 3.181 |
| lead_time_y_advanced | 4.5385 | 0.235 | 19.286 | 0.000 | 4.077 | 5.000 |
vif_series = pd.Series(
[variance_inflation_factor(x_train1.values, i) for i in range(x_train1.shape[1])],
index=x_train1.columns,
dtype=float,
)
print("VIF values: \n\n{}\n".format(vif_series))
VIF values: const 181.913628 no_of_adults 1.276251 required_car_parking_space 1.036220 arrival_month 1.045303 repeated_guest 2.704601 avg_price_per_room 1.577441 length_stay 1.078990 no_of_children_log 1.901144 no_of_previous_cancellations_log 1.514158 no_of_previous_bookings_not_canceled_log 2.944955 no_of_special_requests_log 1.135144 type_of_meal_plan_Meal Plan 2 1.129908 type_of_meal_plan_Meal Plan 3 1.009422 type_of_meal_plan_Not Selected 1.107915 room_type_reserved_Room_Type 2 1.075407 room_type_reserved_Room_Type 3 1.000657 room_type_reserved_Room_Type 4 1.313223 room_type_reserved_Room_Type 5 1.015376 room_type_reserved_Room_Type 6 1.864319 room_type_reserved_Room_Type 7 1.061829 lead_time_y_short 1.104850 lead_time_y_med 1.091866 lead_time_y_long 1.116167 lead_time_y_advanced 1.042915 dtype: float64
# test performance
pred_train = logit.predict(x_train1) > 0.5
pred_train = np.round(pred_train)
X_train2 = x_train1.drop(['room_type_reserved_Room_Type 3'], axis=1)
X_train2.info()
<class 'pandas.core.frame.DataFrame'> Index: 29020 entries, 25629 to 15795 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 const 29020 non-null float64 1 no_of_adults 29020 non-null float64 2 required_car_parking_space 29020 non-null float64 3 arrival_month 29020 non-null float64 4 repeated_guest 29020 non-null float64 5 avg_price_per_room 29020 non-null float64 6 length_stay 29020 non-null float64 7 no_of_children_log 29020 non-null float64 8 no_of_previous_cancellations_log 29020 non-null float64 9 no_of_previous_bookings_not_canceled_log 29020 non-null float64 10 no_of_special_requests_log 29020 non-null float64 11 type_of_meal_plan_Meal Plan 2 29020 non-null float64 12 type_of_meal_plan_Meal Plan 3 29020 non-null float64 13 type_of_meal_plan_Not Selected 29020 non-null float64 14 room_type_reserved_Room_Type 2 29020 non-null float64 15 room_type_reserved_Room_Type 4 29020 non-null float64 16 room_type_reserved_Room_Type 5 29020 non-null float64 17 room_type_reserved_Room_Type 6 29020 non-null float64 18 room_type_reserved_Room_Type 7 29020 non-null float64 19 lead_time_y_short 29020 non-null float64 20 lead_time_y_med 29020 non-null float64 21 lead_time_y_long 29020 non-null float64 22 lead_time_y_advanced 29020 non-null float64 dtypes: float64(23) memory usage: 5.3 MB
logit2 = sm.Logit(y_train, X_train2.astype(float)).fit()
logit2.summary()
Optimization terminated successfully.
Current function value: 0.460616
Iterations 10
| Dep. Variable: | booking_status | No. Observations: | 29020 |
|---|---|---|---|
| Model: | Logit | Df Residuals: | 28997 |
| Method: | MLE | Df Model: | 22 |
| Date: | Fri, 27 Sep 2024 | Pseudo R-squ.: | 0.2706 |
| Time: | 20:50:57 | Log-Likelihood: | -13367. |
| converged: | True | LL-Null: | -18327. |
| Covariance Type: | nonrobust | LLR p-value: | 0.000 |
| coef | std err | z | P>|z| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| const | -2.7560 | 0.435 | -6.334 | 0.000 | -3.609 | -1.903 |
| no_of_adults | 0.2461 | 0.033 | 7.464 | 0.000 | 0.181 | 0.311 |
| required_car_parking_space | -1.4865 | 0.129 | -11.517 | 0.000 | -1.739 | -1.234 |
| arrival_month | -0.0660 | 0.005 | -12.262 | 0.000 | -0.077 | -0.055 |
| repeated_guest | -2.5764 | 0.532 | -4.842 | 0.000 | -3.619 | -1.534 |
| avg_price_per_room | 0.0228 | 0.001 | 36.056 | 0.000 | 0.022 | 0.024 |
| length_stay | 0.1161 | 0.009 | 13.438 | 0.000 | 0.099 | 0.133 |
| no_of_children_log | 0.8148 | 0.145 | 5.627 | 0.000 | 0.531 | 1.099 |
| no_of_previous_cancellations_log | 1.4853 | 0.497 | 2.988 | 0.003 | 0.511 | 2.459 |
| no_of_previous_bookings_not_canceled_log | -0.4811 | 0.499 | -0.965 | 0.335 | -1.458 | 0.496 |
| no_of_special_requests_log | -3.2165 | 0.068 | -47.387 | 0.000 | -3.350 | -3.083 |
| type_of_meal_plan_Meal Plan 2 | -0.3153 | 0.053 | -5.953 | 0.000 | -0.419 | -0.211 |
| type_of_meal_plan_Meal Plan 3 | 1.7120 | 2.879 | 0.595 | 0.552 | -3.931 | 7.355 |
| type_of_meal_plan_Not Selected | 0.8095 | 0.045 | 17.978 | 0.000 | 0.721 | 0.898 |
| room_type_reserved_Room_Type 2 | 0.0981 | 0.116 | 0.847 | 0.397 | -0.129 | 0.325 |
| room_type_reserved_Room_Type 4 | 0.0544 | 0.047 | 1.161 | 0.246 | -0.037 | 0.146 |
| room_type_reserved_Room_Type 5 | -0.8686 | 0.180 | -4.832 | 0.000 | -1.221 | -0.516 |
| room_type_reserved_Room_Type 6 | -1.0992 | 0.129 | -8.510 | 0.000 | -1.352 | -0.846 |
| room_type_reserved_Room_Type 7 | -1.7658 | 0.277 | -6.373 | 0.000 | -2.309 | -1.223 |
| lead_time_y_short | 1.3147 | 0.036 | 36.223 | 0.000 | 1.244 | 1.386 |
| lead_time_y_med | 2.8013 | 0.054 | 52.065 | 0.000 | 2.696 | 2.907 |
| lead_time_y_long | 3.0379 | 0.073 | 41.689 | 0.000 | 2.895 | 3.181 |
| lead_time_y_advanced | 4.5385 | 0.235 | 19.286 | 0.000 | 4.077 | 5.000 |
x_train3 = X_train2.drop(['no_of_previous_bookings_not_canceled_log'], axis=1)
logit3 = sm.Logit(y_train, x_train3.astype(float)).fit()
logit3.summary()
Optimization terminated successfully.
Current function value: 0.460635
Iterations 9
| Dep. Variable: | booking_status | No. Observations: | 29020 |
|---|---|---|---|
| Model: | Logit | Df Residuals: | 28998 |
| Method: | MLE | Df Model: | 21 |
| Date: | Fri, 27 Sep 2024 | Pseudo R-squ.: | 0.2706 |
| Time: | 20:51:11 | Log-Likelihood: | -13368. |
| converged: | True | LL-Null: | -18327. |
| Covariance Type: | nonrobust | LLR p-value: | 0.000 |
| coef | std err | z | P>|z| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| const | -2.9881 | 0.339 | -8.819 | 0.000 | -3.652 | -2.324 |
| no_of_adults | 0.2465 | 0.033 | 7.476 | 0.000 | 0.182 | 0.311 |
| required_car_parking_space | -1.4868 | 0.129 | -11.520 | 0.000 | -1.740 | -1.234 |
| arrival_month | -0.0659 | 0.005 | -12.259 | 0.000 | -0.076 | -0.055 |
| repeated_guest | -2.8099 | 0.485 | -5.794 | 0.000 | -3.760 | -1.859 |
| avg_price_per_room | 0.0228 | 0.001 | 36.069 | 0.000 | 0.022 | 0.024 |
| length_stay | 0.1161 | 0.009 | 13.440 | 0.000 | 0.099 | 0.133 |
| no_of_children_log | 0.8151 | 0.145 | 5.629 | 0.000 | 0.531 | 1.099 |
| no_of_previous_cancellations_log | 1.3372 | 0.450 | 2.974 | 0.003 | 0.456 | 2.218 |
| no_of_special_requests_log | -3.2178 | 0.068 | -47.410 | 0.000 | -3.351 | -3.085 |
| type_of_meal_plan_Meal Plan 2 | -0.3156 | 0.053 | -5.958 | 0.000 | -0.419 | -0.212 |
| type_of_meal_plan_Meal Plan 3 | 1.7126 | 2.881 | 0.594 | 0.552 | -3.934 | 7.359 |
| type_of_meal_plan_Not Selected | 0.8098 | 0.045 | 17.983 | 0.000 | 0.722 | 0.898 |
| room_type_reserved_Room_Type 2 | 0.0981 | 0.116 | 0.848 | 0.397 | -0.129 | 0.325 |
| room_type_reserved_Room_Type 4 | 0.0542 | 0.047 | 1.157 | 0.247 | -0.038 | 0.146 |
| room_type_reserved_Room_Type 5 | -0.8695 | 0.180 | -4.838 | 0.000 | -1.222 | -0.517 |
| room_type_reserved_Room_Type 6 | -1.0998 | 0.129 | -8.514 | 0.000 | -1.353 | -0.847 |
| room_type_reserved_Room_Type 7 | -1.7668 | 0.277 | -6.376 | 0.000 | -2.310 | -1.224 |
| lead_time_y_short | 1.3150 | 0.036 | 36.229 | 0.000 | 1.244 | 1.386 |
| lead_time_y_med | 2.8024 | 0.054 | 52.088 | 0.000 | 2.697 | 2.908 |
| lead_time_y_long | 3.0386 | 0.073 | 41.694 | 0.000 | 2.896 | 3.181 |
| lead_time_y_advanced | 4.5467 | 0.236 | 19.291 | 0.000 | 4.085 | 5.009 |
vif_series = pd.Series(
[variance_inflation_factor(x_train3.values, i) for i in range(x_train3.shape[1])],
index=x_train3.columns,
dtype=float,
)
print("VIF values: \n\n{}\n".format(vif_series))
VIF values: const 175.817619 no_of_adults 1.273740 required_car_parking_space 1.035127 arrival_month 1.044396 repeated_guest 1.465019 avg_price_per_room 1.574018 length_stay 1.078836 no_of_children_log 1.901014 no_of_previous_cancellations_log 1.343447 no_of_special_requests_log 1.130055 type_of_meal_plan_Meal Plan 2 1.129777 type_of_meal_plan_Meal Plan 3 1.009408 type_of_meal_plan_Not Selected 1.107794 room_type_reserved_Room_Type 2 1.075386 room_type_reserved_Room_Type 4 1.312711 room_type_reserved_Room_Type 5 1.014418 room_type_reserved_Room_Type 6 1.863570 room_type_reserved_Room_Type 7 1.061215 lead_time_y_short 1.104778 lead_time_y_med 1.091840 lead_time_y_long 1.116088 lead_time_y_advanced 1.042863 dtype: float64
x_train4 = x_train3.drop(['room_type_reserved_Room_Type 2'], axis=1)
logit4 = sm.Logit(y_train, x_train4.astype(float)).fit()
logit4.summary()
Optimization terminated successfully.
Current function value: 0.460647
Iterations 9
| Dep. Variable: | booking_status | No. Observations: | 29020 |
|---|---|---|---|
| Model: | Logit | Df Residuals: | 28999 |
| Method: | MLE | Df Model: | 20 |
| Date: | Fri, 27 Sep 2024 | Pseudo R-squ.: | 0.2706 |
| Time: | 20:51:24 | Log-Likelihood: | -13368. |
| converged: | True | LL-Null: | -18327. |
| Covariance Type: | nonrobust | LLR p-value: | 0.000 |
| coef | std err | z | P>|z| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| const | -2.9998 | 0.339 | -8.860 | 0.000 | -3.663 | -2.336 |
| no_of_adults | 0.2455 | 0.033 | 7.450 | 0.000 | 0.181 | 0.310 |
| required_car_parking_space | -1.4839 | 0.129 | -11.508 | 0.000 | -1.737 | -1.231 |
| arrival_month | -0.0661 | 0.005 | -12.288 | 0.000 | -0.077 | -0.056 |
| repeated_guest | -2.8119 | 0.485 | -5.798 | 0.000 | -3.762 | -1.861 |
| avg_price_per_room | 0.0228 | 0.001 | 36.083 | 0.000 | 0.022 | 0.024 |
| length_stay | 0.1162 | 0.009 | 13.454 | 0.000 | 0.099 | 0.133 |
| no_of_children_log | 0.8407 | 0.142 | 5.934 | 0.000 | 0.563 | 1.118 |
| no_of_previous_cancellations_log | 1.3364 | 0.450 | 2.971 | 0.003 | 0.455 | 2.218 |
| no_of_special_requests_log | -3.2162 | 0.068 | -47.407 | 0.000 | -3.349 | -3.083 |
| type_of_meal_plan_Meal Plan 2 | -0.3171 | 0.053 | -5.990 | 0.000 | -0.421 | -0.213 |
| type_of_meal_plan_Meal Plan 3 | 1.7139 | 2.880 | 0.595 | 0.552 | -3.931 | 7.358 |
| type_of_meal_plan_Not Selected | 0.8080 | 0.045 | 17.965 | 0.000 | 0.720 | 0.896 |
| room_type_reserved_Room_Type 4 | 0.0529 | 0.047 | 1.131 | 0.258 | -0.039 | 0.145 |
| room_type_reserved_Room_Type 5 | -0.8715 | 0.180 | -4.849 | 0.000 | -1.224 | -0.519 |
| room_type_reserved_Room_Type 6 | -1.1150 | 0.128 | -8.713 | 0.000 | -1.366 | -0.864 |
| room_type_reserved_Room_Type 7 | -1.7747 | 0.277 | -6.405 | 0.000 | -2.318 | -1.232 |
| lead_time_y_short | 1.3150 | 0.036 | 36.231 | 0.000 | 1.244 | 1.386 |
| lead_time_y_med | 2.8049 | 0.054 | 52.201 | 0.000 | 2.700 | 2.910 |
| lead_time_y_long | 3.0393 | 0.073 | 41.703 | 0.000 | 2.896 | 3.182 |
| lead_time_y_advanced | 4.5477 | 0.236 | 19.293 | 0.000 | 4.086 | 5.010 |
x_train5 = x_train4.drop(['room_type_reserved_Room_Type 4'], axis=1)
logit5 = sm.Logit(y_train, x_train5.astype(float)).fit()
logit5.summary()
Optimization terminated successfully.
Current function value: 0.460669
Iterations 9
| Dep. Variable: | booking_status | No. Observations: | 29020 |
|---|---|---|---|
| Model: | Logit | Df Residuals: | 29000 |
| Method: | MLE | Df Model: | 19 |
| Date: | Fri, 27 Sep 2024 | Pseudo R-squ.: | 0.2705 |
| Time: | 20:51:42 | Log-Likelihood: | -13369. |
| converged: | True | LL-Null: | -18327. |
| Covariance Type: | nonrobust | LLR p-value: | 0.000 |
| coef | std err | z | P>|z| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| const | -3.0179 | 0.338 | -8.916 | 0.000 | -3.681 | -2.354 |
| no_of_adults | 0.2541 | 0.032 | 7.922 | 0.000 | 0.191 | 0.317 |
| required_car_parking_space | -1.4849 | 0.129 | -11.519 | 0.000 | -1.738 | -1.232 |
| arrival_month | -0.0663 | 0.005 | -12.334 | 0.000 | -0.077 | -0.056 |
| repeated_guest | -2.8085 | 0.485 | -5.789 | 0.000 | -3.759 | -1.858 |
| avg_price_per_room | 0.0230 | 0.001 | 38.361 | 0.000 | 0.022 | 0.024 |
| length_stay | 0.1173 | 0.009 | 13.669 | 0.000 | 0.100 | 0.134 |
| no_of_children_log | 0.8249 | 0.141 | 5.849 | 0.000 | 0.549 | 1.101 |
| no_of_previous_cancellations_log | 1.3352 | 0.450 | 2.966 | 0.003 | 0.453 | 2.217 |
| no_of_special_requests_log | -3.2133 | 0.068 | -47.410 | 0.000 | -3.346 | -3.080 |
| type_of_meal_plan_Meal Plan 2 | -0.3263 | 0.052 | -6.236 | 0.000 | -0.429 | -0.224 |
| type_of_meal_plan_Meal Plan 3 | 1.7231 | 2.937 | 0.587 | 0.557 | -4.034 | 7.480 |
| type_of_meal_plan_Not Selected | 0.7971 | 0.044 | 18.154 | 0.000 | 0.711 | 0.883 |
| room_type_reserved_Room_Type 5 | -0.8887 | 0.179 | -4.963 | 0.000 | -1.240 | -0.538 |
| room_type_reserved_Room_Type 6 | -1.1382 | 0.126 | -9.011 | 0.000 | -1.386 | -0.891 |
| room_type_reserved_Room_Type 7 | -1.8140 | 0.275 | -6.596 | 0.000 | -2.353 | -1.275 |
| lead_time_y_short | 1.3134 | 0.036 | 36.222 | 0.000 | 1.242 | 1.385 |
| lead_time_y_med | 2.8006 | 0.054 | 52.257 | 0.000 | 2.696 | 2.906 |
| lead_time_y_long | 3.0337 | 0.073 | 41.714 | 0.000 | 2.891 | 3.176 |
| lead_time_y_advanced | 4.5433 | 0.236 | 19.275 | 0.000 | 4.081 | 5.005 |
Model performance evaluation¶
odds_ratio = np.exp(logit5.params)
pd.DataFrame(odds_ratio, x_train5.columns, columns=["Odds Ratio"]).T
| const | no_of_adults | required_car_parking_space | arrival_month | repeated_guest | avg_price_per_room | length_stay | no_of_children_log | no_of_previous_cancellations_log | no_of_special_requests_log | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Meal Plan 3 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | lead_time_y_short | lead_time_y_med | lead_time_y_long | lead_time_y_advanced | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Odds Ratio | 0.048904 | 1.289349 | 0.22652 | 0.935852 | 0.060298 | 1.023254 | 1.124466 | 2.281694 | 3.80058 | 0.040223 | 0.721571 | 5.601864 | 2.219001 | 0.411194 | 0.320389 | 0.163008 | 3.718981 | 16.453987 | 20.774232 | 93.995823 |
# finding the percentage change
perc_change_odds = (np.exp(logit5.params) - 1) * 100
# adding the change_odds% to a dataframe
pd.DataFrame(perc_change_odds, x_train5.columns, columns=["change_odds%"]).T
| const | no_of_adults | required_car_parking_space | arrival_month | repeated_guest | avg_price_per_room | length_stay | no_of_children_log | no_of_previous_cancellations_log | no_of_special_requests_log | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Meal Plan 3 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | lead_time_y_short | lead_time_y_med | lead_time_y_long | lead_time_y_advanced | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| change_odds% | -95.109614 | 28.934884 | -77.348036 | -6.414797 | -93.970239 | 2.325378 | 12.44657 | 128.169403 | 280.058017 | -95.977714 | -27.842909 | 460.186381 | 121.900149 | -58.880576 | -67.961081 | -83.699181 | 271.898128 | 1545.398745 | 1977.423247 | 9299.582334 |
# fitting the model on training set
logit = sm.Logit(y_train, x_train5.astype(float))
lg3 = logit.fit()
pred_train4 = logit5.predict(x_train5)
pred_train4 = np.round(pred_train4)
Optimization terminated successfully.
Current function value: 0.460669
Iterations 9
Final Model Summary¶
print("Accuracy on training set : ", accuracy_score(y_train, pred_train4))
Accuracy on training set : 0.7816678152997932
logit_roc_auc_train = roc_auc_score(y_train, logit5.predict(x_train5))
fpr, tpr, thresholds = roc_curve(y_train, logit5.predict(x_train5))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
X_test1 = x_test.drop([ 'no_of_weekend_nights_log',
'no_of_week_nights',
'market_segment_type_Online',
'market_segment_type_Offline',
'market_segment_type_Corporate',
'market_segment_type_Complementary',
'room_type_reserved_Room_Type 3',
'room_type_reserved_Room_Type 4',
'no_of_previous_bookings_not_canceled_log',
'room_type_reserved_Room_Type 2'
], axis=1)
pred_test = logit5.predict(X_test1) > 0.5
pred_test = np.round(pred_test)
print("Accuracy on training set : ", accuracy_score(y_train, pred_train4))
print("Accuracy on test set : ", accuracy_score(y_test, pred_test))
Accuracy on training set : 0.7816678152997932 Accuracy on test set : 0.7789110957960027
Building a Decision Tree model¶
tree_data = dummy_data.astype(float)
tree_data = tree_data.drop(['arrival_date','arrival_year','no_of_week_nights',
'no_of_weekend_nights_log' ], axis=1)
x = tree_data.drop("booking_status" , axis=1)
y = tree_data.pop("booking_status")
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=.30, random_state=1)
- using a simolifed data set for the tree
# building a decision tree using the dtclassifier function
dTree = DecisionTreeClassifier(criterion = 'gini', random_state=1)
dTree.fit(x_train, y_train)
DecisionTreeClassifier(random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(random_state=1)
Decision tree classifier(random_state=1)
# scoring the accurancy on train & test data
print("Accuracy on training set : ", dTree.score(x_train, y_train))
print("Accuracy on test set : ", dTree.score(x_test, y_test))
Accuracy on training set : 0.9924385633270322 Accuracy on test set : 0.8585867867315997
# checking the positive outcomes
y.sum(axis = 0)
11885.0
Observations
- 11885 predictions of cancellation and actual of 11989 isn't a good model.
- The tree scores very well at accuracy, it captures most of the data.
# function to create confusion matrix
def make_confusion_matrix(model,y_actual,labels=[1,0]):
y_predict = model.predict(x_test)
cm =metrics.confusion_matrix(y_actual, y_predict, labels=[0,1])
df_cm = pd.DataFrame(cm, index = [i for i in["Actucal - No" , "Actual - Yes"]],
columns = [i for i in ["Predicted - No" , "Predicted - Yes"]])
group_counts=["{0:0.0f}".format(value) for value in
cm.flatten()]
group_percentages = ["{0:.2%}".format(value) for value in
cm.flatten()/np.sum(cm)]
labels = [f"{v1}\n{v2}" for v1, v2 in
zip(group_counts,group_percentages)]
labels = np.asarray(labels).reshape(2,2)
plt.figure(figsize = (10,7))
sns.heatmap(df_cm, annot=labels,fmt='')
plt.ylabel('True label')
plt.xlabel('predicated label')
## function to calculate recall score
def get_recall_score(model):
pred_train = model.predict(x_train)
pred_test = model.predict(x_test)
print("Recall on training set : ",metrics. recall_score(y_train, pred_train))
print("Recall on test set : ",metrics. recall_score(y_test, pred_test))
# another confusion matrix
dtree = DecisionTreeClassifier()
dtree.fit(x_train, y_train)
make_confusion_matrix(dtree,y_test)
# check the recall on the train to test
get_recall_score(dtree)
Recall on training set : 0.9817051297381323 Recall on test set : 0.7924474730266894
the_features = list(x.columns)
print(the_features)
['no_of_adults', 'required_car_parking_space', 'lead_time', 'arrival_month', 'repeated_guest', 'avg_price_per_room', 'length_stay', 'no_of_children_log', 'no_of_previous_cancellations_log', 'no_of_previous_bookings_not_canceled_log', 'no_of_special_requests_log', 'type_of_meal_plan_Meal Plan 2', 'type_of_meal_plan_Meal Plan 3', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 3', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Complementary', 'market_segment_type_Corporate', 'market_segment_type_Offline', 'market_segment_type_Online']
plt.figure(figsize=(20,30))
tree.plot_tree(dtree,feature_names=the_features,filled=True,fontsize=9,node_ids=True,class_names=True)
plt.show()
print(tree.export_text(dtree,feature_names=the_features,show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests_log <= 0.90 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 90.50 | | | | |--- length_stay <= 5.50 | | | | | |--- avg_price_per_room <= 201.50 | | | | | | |--- lead_time <= 74.50 | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | |--- lead_time <= 1.50 | | | | | | | | | |--- avg_price_per_room <= 62.00 | | | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | | | |--- weights: [15.00, 0.00] class: 0.0 | | | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 62.00 | | | | | | | | | | |--- avg_price_per_room <= 151.59 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- avg_price_per_room > 151.59 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- lead_time > 1.50 | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | |--- avg_price_per_room <= 138.00 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- avg_price_per_room > 138.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- arrival_month > 5.50 | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- repeated_guest <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | | |--- repeated_guest > 0.50 | | | | | | | | | | | |--- weights: [169.00, 0.00] class: 0.0 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- lead_time <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- lead_time > 9.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | |--- avg_price_per_room <= 50.00 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- weights: [19.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 50.00 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | |--- lead_time > 74.50 | | | | | | | |--- lead_time <= 78.50 | | | | | | | | |--- avg_price_per_room <= 95.47 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- avg_price_per_room <= 69.85 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 69.85 | | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1.0 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- weights: [26.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 95.47 | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | |--- avg_price_per_room <= 120.24 | | | | | | | | | | | |--- weights: [0.00, 30.00] class: 1.0 | | | | | | | | | | |--- avg_price_per_room > 120.24 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | |--- lead_time > 78.50 | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | | | |--- weights: [110.00, 0.00] class: 0.0 | | | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | |--- lead_time <= 86.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 86.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | |--- avg_price_per_room <= 66.75 | | | | | | | | | | |--- avg_price_per_room <= 63.25 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 63.25 | | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 66.75 | | | | | | | | | | |--- avg_price_per_room <= 73.53 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 73.53 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | |--- avg_price_per_room > 201.50 | | | | | | |--- arrival_month <= 10.50 | | | | | | | |--- weights: [0.00, 17.00] class: 1.0 | | | | | | |--- arrival_month > 10.50 | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | |--- length_stay > 5.50 | | | | | |--- avg_price_per_room <= 115.50 | | | | | | |--- length_stay <= 14.50 | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | |--- lead_time <= 3.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 3.50 | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | |--- length_stay <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- length_stay > 11.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | |--- lead_time <= 75.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 75.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | |--- arrival_month > 7.50 | | | | | | | | |--- avg_price_per_room <= 70.42 | | | | | | | | | |--- weights: [34.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 70.42 | | | | | | | | | |--- avg_price_per_room <= 71.42 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 71.42 | | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- length_stay > 14.50 | | | | | | | |--- weights: [0.00, 7.00] class: 1.0 | | | | | |--- avg_price_per_room > 115.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- avg_price_per_room <= 137.80 | | | | | | | | |--- length_stay <= 10.00 | | | | | | | | | |--- weights: [0.00, 43.00] class: 1.0 | | | | | | | | |--- length_stay > 10.00 | | | | | | | | | |--- lead_time <= 9.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- lead_time > 9.00 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 137.80 | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | |--- lead_time > 90.50 | | | | |--- lead_time <= 117.50 | | | | | |--- avg_price_per_room <= 93.58 | | | | | | |--- avg_price_per_room <= 75.07 | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | |--- avg_price_per_room <= 58.75 | | | | | | | | | |--- weights: [14.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 58.75 | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | |--- lead_time <= 104.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 104.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | |--- avg_price_per_room <= 61.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 61.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- arrival_month > 7.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- avg_price_per_room <= 66.50 | | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 66.50 | | | | | | | | | | |--- length_stay <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- length_stay > 4.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [32.00, 0.00] class: 0.0 | | | | | | |--- avg_price_per_room > 75.07 | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | |--- avg_price_per_room <= 88.50 | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 80.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 80.50 | | | | | | | | | | | |--- weights: [23.00, 0.00] class: 0.0 | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | |--- weights: [50.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 88.50 | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | |--- arrival_month > 3.50 | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 86.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 86.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- lead_time <= 112.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- lead_time > 112.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | |--- avg_price_per_room > 93.58 | | | | | | |--- no_of_adults <= 1.50 | | | | | | | |--- length_stay <= 3.50 | | | | | | | | |--- avg_price_per_room <= 117.50 | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | |--- weights: [0.00, 59.00] class: 1.0 | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 117.50 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- lead_time <= 101.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- lead_time > 101.00 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- length_stay > 3.50 | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | |--- no_of_adults > 1.50 | | | | | | | |--- avg_price_per_room <= 108.50 | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | |--- avg_price_per_room <= 101.12 | | | | | | | | | | |--- lead_time <= 110.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- lead_time > 110.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 101.12 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0.0 | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | |--- arrival_month <= 11.00 | | | | | | | | | | |--- weights: [0.00, 47.00] class: 1.0 | | | | | | | | | |--- arrival_month > 11.00 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 108.50 | | | | | | | | |--- lead_time <= 104.00 | | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 177.83 | | | | | | | | | | | |--- weights: [45.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 177.83 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- lead_time > 104.00 | | | | | | | | | |--- avg_price_per_room <= 110.86 | | | | | | | | | | |--- weights: [0.00, 12.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 110.86 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | |--- lead_time > 117.50 | | | | | |--- no_of_adults <= 1.50 | | | | | | |--- avg_price_per_room <= 122.00 | | | | | | | |--- weights: [141.00, 0.00] class: 0.0 | | | | | | |--- avg_price_per_room > 122.00 | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | |--- no_of_adults > 1.50 | | | | | | |--- avg_price_per_room <= 89.88 | | | | | | | |--- lead_time <= 125.50 | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- avg_price_per_room <= 82.88 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 82.88 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | |--- avg_price_per_room <= 70.12 | | | | | | | | | | |--- lead_time <= 124.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 124.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 70.12 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | |--- lead_time > 125.50 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [65.00, 0.00] class: 0.0 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | |--- avg_price_per_room > 89.88 | | | | | | | |--- avg_price_per_room <= 96.45 | | | | | | | | |--- avg_price_per_room <= 94.75 | | | | | | | | | |--- lead_time <= 125.50 | | | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 125.50 | | | | | | | | | | |--- lead_time <= 138.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 138.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- avg_price_per_room > 94.75 | | | | | | | | | |--- arrival_month <= 7.00 | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1.0 | | | | | | | | | |--- arrival_month > 7.00 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 96.45 | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | |--- avg_price_per_room <= 223.26 | | | | | | | | | | |--- avg_price_per_room <= 97.41 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 97.41 | | | | | | | | | | | |--- weights: [60.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 223.26 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 13.50 | | | | |--- avg_price_per_room <= 202.67 | | | | | |--- lead_time <= 3.50 | | | | | | |--- arrival_month <= 5.50 | | | | | | | |--- length_stay <= 6.50 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- weights: [67.00, 0.00] class: 0.0 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- lead_time <= 0.50 | | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- lead_time > 0.50 | | | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | |--- length_stay > 6.50 | | | | | | | | |--- weights: [0.00, 4.00] class: 1.0 | | | | | | |--- arrival_month > 5.50 | | | | | | | |--- length_stay <= 12.00 | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | |--- avg_price_per_room <= 76.35 | | | | | | | | | | |--- avg_price_per_room <= 74.40 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 74.40 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 76.35 | | | | | | | | | | |--- avg_price_per_room <= 118.04 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- avg_price_per_room > 118.04 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | |--- avg_price_per_room <= 178.00 | | | | | | | | | | |--- lead_time <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- avg_price_per_room > 178.00 | | | | | | | | | | |--- avg_price_per_room <= 182.25 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- avg_price_per_room > 182.25 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | |--- length_stay > 12.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | |--- lead_time > 3.50 | | | | | | |--- avg_price_per_room <= 99.38 | | | | | | | |--- avg_price_per_room <= 78.90 | | | | | | | | |--- length_stay <= 15.00 | | | | | | | | | |--- length_stay <= 7.50 | | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | | |--- weights: [84.00, 0.00] class: 0.0 | | | | | | | | | |--- length_stay > 7.50 | | | | | | | | | | |--- lead_time <= 7.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- lead_time > 7.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | |--- length_stay > 15.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | |--- avg_price_per_room > 78.90 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [23.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- length_stay <= 6.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- length_stay > 6.50 | | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1.0 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [42.00, 0.00] class: 0.0 | | | | | | |--- avg_price_per_room > 99.38 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 119.25 | | | | | | | | | | |--- avg_price_per_room <= 117.25 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- avg_price_per_room > 117.25 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 119.25 | | | | | | | | | | |--- avg_price_per_room <= 129.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- avg_price_per_room > 129.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- lead_time <= 9.50 | | | | | | | | | |--- lead_time <= 5.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 5.50 | | | | | | | | | | |--- avg_price_per_room <= 160.17 | | | | | | | | | | | |--- weights: [41.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 160.17 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- lead_time > 9.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- lead_time <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 10.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0.0 | | | | |--- avg_price_per_room > 202.67 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- weights: [0.00, 32.00] class: 1.0 | | | | | |--- arrival_month > 11.50 | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | |--- lead_time > 13.50 | | | | |--- avg_price_per_room <= 105.27 | | | | | |--- avg_price_per_room <= 60.07 | | | | | | |--- lead_time <= 84.50 | | | | | | | |--- lead_time <= 51.50 | | | | | | | | |--- lead_time <= 50.50 | | | | | | | | | |--- avg_price_per_room <= 21.67 | | | | | | | | | | |--- weights: [19.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 21.67 | | | | | | | | | | |--- avg_price_per_room <= 49.84 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 49.84 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- lead_time > 50.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | |--- lead_time > 51.50 | | | | | | | | |--- weights: [32.00, 0.00] class: 0.0 | | | | | | |--- lead_time > 84.50 | | | | | | | |--- lead_time <= 87.50 | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | | | |--- lead_time > 87.50 | | | | | | | | |--- length_stay <= 8.00 | | | | | | | | | |--- avg_price_per_room <= 59.43 | | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | | |--- weights: [12.00, 0.00] class: 0.0 | | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- avg_price_per_room > 59.43 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- length_stay > 8.00 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | |--- avg_price_per_room > 60.07 | | | | | | |--- lead_time <= 25.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- weights: [29.00, 0.00] class: 0.0 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | |--- avg_price_per_room <= 69.16 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 69.16 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [54.00, 0.00] class: 0.0 | | | | | | |--- lead_time > 25.50 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 71.92 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 71.92 | | | | | | | | | | | |--- truncated branch of depth 23 | | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | | |--- weights: [15.00, 0.00] class: 0.0 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- arrival_month <= 5.00 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 5.00 | | | | | | | | | | |--- lead_time <= 57.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 57.50 | | | | | | | | | | | |--- weights: [0.00, 35.00] class: 1.0 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 90.20 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 90.20 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- avg_price_per_room <= 74.53 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- avg_price_per_room > 74.53 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | |--- weights: [6.00, 0.00] class: 0.0 | | | | |--- avg_price_per_room > 105.27 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- arrival_month <= 10.50 | | | | | | | |--- avg_price_per_room <= 195.30 | | | | | | | | |--- lead_time <= 54.50 | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | |--- lead_time <= 38.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- lead_time > 38.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- lead_time > 54.50 | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | |--- lead_time <= 135.50 | | | | | | | | | | | |--- truncated branch of depth 22 | | | | | | | | | | |--- lead_time > 135.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | |--- avg_price_per_room > 195.30 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- weights: [0.00, 92.00] class: 1.0 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | |--- arrival_month > 10.50 | | | | | | | |--- lead_time <= 22.50 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1.0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- weights: [22.00, 0.00] class: 0.0 | | | | | | | |--- lead_time > 22.50 | | | | | | | | |--- avg_price_per_room <= 168.06 | | | | | | | | | |--- avg_price_per_room <= 147.75 | | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- avg_price_per_room > 147.75 | | | | | | | | | | |--- weights: [0.00, 15.00] class: 1.0 | | | | | | | | |--- avg_price_per_room > 168.06 | | | | | | | | | |--- length_stay <= 8.50 | | | | | | | | | | |--- lead_time <= 80.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 80.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- length_stay > 8.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- length_stay <= 11.00 | | | | | | | |--- weights: [39.00, 0.00] class: 0.0 | | | | | | |--- length_stay > 11.00 | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | |--- no_of_special_requests_log > 0.90 | | |--- no_of_special_requests_log <= 1.24 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | |--- lead_time <= 102.50 | | | | | | |--- length_stay <= 15.00 | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | |--- lead_time <= 91.50 | | | | | | | | | |--- avg_price_per_room <= 129.50 | | | | | | | | | | |--- weights: [848.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 129.50 | | | | | | | | | | |--- avg_price_per_room <= 131.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 131.50 | | | | | | | | | | | |--- weights: [27.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 91.50 | | | | | | | | | |--- no_of_children_log <= 0.90 | | | | | | | | | | |--- weights: [43.00, 0.00] class: 0.0 | | | | | | | | | |--- no_of_children_log > 0.90 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | |--- length_stay <= 4.50 | | | | | | | | | |--- weights: [12.00, 0.00] class: 0.0 | | | | | | | | |--- length_stay > 4.50 | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | |--- length_stay > 15.00 | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | |--- lead_time > 102.50 | | | | | | |--- lead_time <= 104.50 | | | | | | | |--- lead_time <= 103.50 | | | | | | | | |--- no_of_children_log <= 0.90 | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | | | |--- no_of_children_log > 0.90 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | |--- lead_time > 103.50 | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | | |--- lead_time > 104.50 | | | | | | | |--- avg_price_per_room <= 141.75 | | | | | | | | |--- lead_time <= 150.50 | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | |--- avg_price_per_room <= 81.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 81.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | |--- weights: [27.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 150.50 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 141.75 | | | | | | | | |--- lead_time <= 110.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | |--- lead_time > 110.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | |--- lead_time <= 63.00 | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | |--- weights: [18.00, 0.00] class: 0.0 | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | |--- length_stay <= 1.50 | | | | | | | | |--- weights: [2.00, 1.00] class: 0.0 | | | | | | | |--- length_stay > 1.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | |--- lead_time > 63.00 | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | |--- market_segment_type_Online > 0.50 | | | | |--- lead_time <= 8.50 | | | | | |--- lead_time <= 4.50 | | | | | | |--- length_stay <= 14.00 | | | | | | | |--- avg_price_per_room <= 219.86 | | | | | | | | |--- length_stay <= 6.50 | | | | | | | | | |--- avg_price_per_room <= 157.64 | | | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 157.64 | | | | | | | | | | |--- avg_price_per_room <= 158.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- avg_price_per_room > 158.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- length_stay > 6.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0.0 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- avg_price_per_room <= 113.02 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 113.02 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | |--- avg_price_per_room > 219.86 | | | | | | | | |--- avg_price_per_room <= 223.58 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- avg_price_per_room > 223.58 | | | | | | | | | |--- avg_price_per_room <= 237.25 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 237.25 | | | | | | | | | | |--- avg_price_per_room <= 244.12 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- avg_price_per_room > 244.12 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | |--- length_stay > 14.00 | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | |--- lead_time > 4.50 | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | |--- avg_price_per_room <= 123.60 | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | |--- avg_price_per_room <= 88.76 | | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | | |--- weights: [32.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 88.76 | | | | | | | | | | |--- avg_price_per_room <= 91.22 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 91.22 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | |--- weights: [95.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 123.60 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 124.05 | | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 124.05 | | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | |--- repeated_guest <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- repeated_guest > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- weights: [14.00, 0.00] class: 0.0 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | |--- avg_price_per_room <= 94.48 | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | | | |--- avg_price_per_room > 94.48 | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | |--- lead_time > 8.50 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- avg_price_per_room <= 127.62 | | | | | | | |--- lead_time <= 43.50 | | | | | | | | |--- length_stay <= 9.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | | |--- weights: [87.00, 0.00] class: 0.0 | | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | | |--- truncated branch of depth 24 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [127.00, 0.00] class: 0.0 | | | | | | | | |--- length_stay > 9.50 | | | | | | | | | |--- lead_time <= 29.50 | | | | | | | | | | |--- avg_price_per_room <= 76.22 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 76.22 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 29.50 | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | | | | | |--- lead_time > 43.50 | | | | | | | | |--- length_stay <= 10.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- avg_price_per_room <= 76.54 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- avg_price_per_room > 76.54 | | | | | | | | | | | |--- truncated branch of depth 23 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- length_stay > 10.50 | | | | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | | | | |--- avg_price_per_room > 127.62 | | | | | | | |--- lead_time <= 142.50 | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | |--- avg_price_per_room <= 179.62 | | | | | | | | | | |--- lead_time <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- lead_time > 11.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | |--- avg_price_per_room > 179.62 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- lead_time <= 139.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | | |--- lead_time > 139.50 | | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- lead_time <= 100.50 | | | | | | | | | | | |--- weights: [49.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 100.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- lead_time > 142.50 | | | | | | | | |--- avg_price_per_room <= 142.65 | | | | | | | | | |--- arrival_month <= 10.00 | | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_month > 10.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- avg_price_per_room > 142.65 | | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | | |--- avg_price_per_room <= 216.45 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 216.45 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1.0 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- room_type_reserved_Room_Type 7 <= 0.50 | | | | | | | |--- weights: [180.00, 0.00] class: 0.0 | | | | | | |--- room_type_reserved_Room_Type 7 > 0.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | |--- no_of_special_requests_log > 1.24 | | | |--- lead_time <= 90.50 | | | | |--- length_stay <= 12.00 | | | | | |--- length_stay <= 4.50 | | | | | | |--- length_stay <= 3.50 | | | | | | | |--- weights: [1689.00, 0.00] class: 0.0 | | | | | | |--- length_stay > 3.50 | | | | | | | |--- room_type_reserved_Room_Type 6 <= 0.50 | | | | | | | | |--- avg_price_per_room <= 90.05 | | | | | | | | | |--- lead_time <= 48.00 | | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | | |--- weights: [61.00, 0.00] class: 0.0 | | | | | | | | | |--- lead_time > 48.00 | | | | | | | | | | |--- avg_price_per_room <= 89.85 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 89.85 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- avg_price_per_room > 90.05 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- lead_time <= 54.50 | | | | | | | | | | | |--- weights: [221.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 54.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- lead_time <= 28.50 | | | | | | | | | | | |--- weights: [15.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 28.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- room_type_reserved_Room_Type 6 > 0.50 | | | | | | | | |--- lead_time <= 31.00 | | | | | | | | | |--- weights: [13.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 31.00 | | | | | | | | | |--- avg_price_per_room <= 159.42 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 159.42 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | |--- length_stay > 4.50 | | | | | | |--- no_of_special_requests_log <= 1.50 | | | | | | | |--- length_stay <= 6.50 | | | | | | | | |--- avg_price_per_room <= 92.33 | | | | | | | | | |--- avg_price_per_room <= 90.95 | | | | | | | | | | |--- lead_time <= 54.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 54.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 90.95 | | | | | | | | | | |--- lead_time <= 11.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 11.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- avg_price_per_room > 92.33 | | | | | | | | | |--- lead_time <= 80.50 | | | | | | | | | | |--- lead_time <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 11.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- lead_time > 80.50 | | | | | | | | | | |--- lead_time <= 81.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- lead_time > 81.50 | | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0.0 | | | | | | | |--- length_stay > 6.50 | | | | | | | | |--- lead_time <= 9.00 | | | | | | | | | |--- weights: [13.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 9.00 | | | | | | | | | |--- lead_time <= 34.50 | | | | | | | | | | |--- avg_price_per_room <= 83.24 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | | | | | | |--- avg_price_per_room > 83.24 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- lead_time > 34.50 | | | | | | | | | | |--- lead_time <= 72.50 | | | | | | | | | | | |--- weights: [19.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 72.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | |--- no_of_special_requests_log > 1.50 | | | | | | | |--- weights: [69.00, 0.00] class: 0.0 | | | | |--- length_stay > 12.00 | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | |--- lead_time > 90.50 | | | | |--- avg_price_per_room <= 202.95 | | | | | |--- arrival_month <= 8.50 | | | | | | |--- lead_time <= 150.50 | | | | | | | |--- length_stay <= 5.50 | | | | | | | | |--- avg_price_per_room <= 80.33 | | | | | | | | | |--- avg_price_per_room <= 76.37 | | | | | | | | | | |--- weights: [22.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 76.37 | | | | | | | | | | |--- lead_time <= 98.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- lead_time > 98.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- avg_price_per_room > 80.33 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- lead_time <= 115.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- lead_time > 115.00 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- length_stay > 5.50 | | | | | | | | |--- no_of_children_log <= 0.90 | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | |--- lead_time <= 142.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 142.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- no_of_children_log > 0.90 | | | | | | | | | |--- no_of_special_requests_log <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 58.27 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 58.27 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- no_of_special_requests_log > 1.50 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | |--- lead_time > 150.50 | | | | | | | |--- avg_price_per_room <= 103.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 103.50 | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | |--- arrival_month > 8.50 | | | | | | |--- no_of_special_requests_log <= 1.50 | | | | | | | |--- avg_price_per_room <= 90.42 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- lead_time <= 107.00 | | | | | | | | | | |--- avg_price_per_room <= 70.52 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 70.52 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- lead_time > 107.00 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- lead_time <= 101.00 | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0.0 | | | | | | | | | |--- lead_time > 101.00 | | | | | | | | | | |--- lead_time <= 104.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- lead_time > 104.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- avg_price_per_room > 90.42 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- weights: [11.00, 0.00] class: 0.0 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- avg_price_per_room <= 153.15 | | | | | | | | | | |--- avg_price_per_room <= 92.60 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 92.60 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | |--- avg_price_per_room > 153.15 | | | | | | | | | | |--- lead_time <= 100.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 100.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- no_of_special_requests_log > 1.50 | | | | | | | |--- weights: [52.00, 0.00] class: 0.0 | | | | |--- avg_price_per_room > 202.95 | | | | | |--- weights: [0.00, 7.00] class: 1.0 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests_log <= 0.90 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- no_of_adults <= 1.50 | | | | | |--- lead_time <= 163.50 | | | | | | |--- lead_time <= 162.50 | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | |--- arrival_month > 6.50 | | | | | | | | |--- weights: [1.00, 1.00] class: 0.0 | | | | | | |--- lead_time > 162.50 | | | | | | | |--- weights: [0.00, 15.00] class: 1.0 | | | | | |--- lead_time > 163.50 | | | | | | |--- lead_time <= 341.00 | | | | | | | |--- lead_time <= 173.00 | | | | | | | | |--- lead_time <= 165.50 | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | |--- weights: [61.00, 6.00] class: 0.0 | | | | | | | | |--- lead_time > 165.50 | | | | | | | | | |--- avg_price_per_room <= 77.35 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 77.35 | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1.0 | | | | | | | |--- lead_time > 173.00 | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | |--- avg_price_per_room <= 88.00 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 88.00 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | |--- avg_price_per_room <= 98.00 | | | | | | | | | | |--- avg_price_per_room <= 55.21 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 55.21 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- avg_price_per_room > 98.00 | | | | | | | | | | |--- lead_time <= 231.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 231.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | |--- lead_time > 341.00 | | | | | | | |--- length_stay <= 5.50 | | | | | | | | |--- lead_time <= 402.00 | | | | | | | | | |--- avg_price_per_room <= 80.00 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 80.00 | | | | | | | | | | |--- lead_time <= 381.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 381.50 | | | | | | | | | | | |--- weights: [3.00, 2.00] class: 0.0 | | | | | | | | |--- lead_time > 402.00 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | | | |--- length_stay > 5.50 | | | | | | | | |--- avg_price_per_room <= 88.33 | | | | | | | | | |--- weights: [0.00, 7.00] class: 1.0 | | | | | | | | |--- avg_price_per_room > 88.33 | | | | | | | | | |--- weights: [1.00, 1.00] class: 0.0 | | | | |--- no_of_adults > 1.50 | | | | | |--- avg_price_per_room <= 84.58 | | | | | | |--- lead_time <= 244.00 | | | | | | | |--- length_stay <= 2.50 | | | | | | | | |--- lead_time <= 166.50 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 166.50 | | | | | | | | | |--- lead_time <= 229.50 | | | | | | | | | | |--- avg_price_per_room <= 69.34 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 69.34 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 229.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- length_stay > 2.50 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 27.07 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 27.07 | | | | | | | | | | |--- avg_price_per_room <= 66.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 66.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | |--- lead_time > 244.00 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- avg_price_per_room <= 75.83 | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | | |--- weights: [0.00, 8.00] class: 1.0 | | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | | |--- weights: [19.00, 0.00] class: 0.0 | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | |--- length_stay <= 6.00 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- length_stay > 6.00 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 75.83 | | | | | | | | | |--- lead_time <= 292.50 | | | | | | | | | | |--- length_stay <= 6.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- length_stay > 6.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 292.50 | | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | | |--- weights: [0.00, 23.00] class: 1.0 | | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [37.00, 0.00] class: 0.0 | | | | | |--- avg_price_per_room > 84.58 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | |--- lead_time <= 316.00 | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 316.00 | | | | | | | | | |--- lead_time <= 338.00 | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0.0 | | | | | | | | | |--- lead_time > 338.00 | | | | | | | | | | |--- weights: [1.00, 5.00] class: 1.0 | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | |--- weights: [6.00, 0.00] class: 0.0 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [9.00, 0.00] class: 0.0 | | | |--- market_segment_type_Online > 0.50 | | | | |--- avg_price_per_room <= 2.50 | | | | | |--- no_of_adults <= 1.50 | | | | | | |--- lead_time <= 285.50 | | | | | | | |--- weights: [11.00, 0.00] class: 0.0 | | | | | | |--- lead_time > 285.50 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | |--- no_of_adults > 1.50 | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | |--- avg_price_per_room > 2.50 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- weights: [0.00, 525.00] class: 1.0 | | | | | |--- arrival_month > 11.50 | | | | | | |--- length_stay <= 3.50 | | | | | | | |--- lead_time <= 204.00 | | | | | | | | |--- weights: [0.00, 11.00] class: 1.0 | | | | | | | |--- lead_time > 204.00 | | | | | | | | |--- lead_time <= 214.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 214.50 | | | | | | | | | |--- lead_time <= 275.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1.0 | | | | | | | | | |--- lead_time > 275.50 | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1.0 | | | | | | |--- length_stay > 3.50 | | | | | | | |--- avg_price_per_room <= 80.51 | | | | | | | | |--- weights: [0.00, 41.00] class: 1.0 | | | | | | | |--- avg_price_per_room > 80.51 | | | | | | | | |--- avg_price_per_room <= 81.43 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 81.43 | | | | | | | | | |--- weights: [0.00, 13.00] class: 1.0 | | |--- no_of_special_requests_log > 0.90 | | | |--- market_segment_type_Offline <= 0.50 | | | | |--- lead_time <= 180.50 | | | | | |--- lead_time <= 159.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- lead_time <= 152.50 | | | | | | | | |--- avg_price_per_room <= 90.81 | | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 83.62 | | | | | | | | | | | |--- weights: [1.00, 2.00] class: 1.0 | | | | | | | | | | |--- avg_price_per_room > 83.62 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 90.81 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | |--- lead_time > 152.50 | | | | | | | | |--- lead_time <= 156.50 | | | | | | | | | |--- weights: [12.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 156.50 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- length_stay <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- length_stay > 4.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- avg_price_per_room <= 87.12 | | | | | | | | |--- lead_time <= 158.50 | | | | | | | | | |--- weights: [0.00, 7.00] class: 1.0 | | | | | | | | |--- lead_time > 158.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 87.12 | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | |--- avg_price_per_room <= 89.62 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 89.62 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | |--- lead_time > 159.50 | | | | | | |--- no_of_adults <= 0.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | |--- no_of_adults > 0.50 | | | | | | | |--- avg_price_per_room <= 93.44 | | | | | | | | |--- length_stay <= 5.50 | | | | | | | | | |--- lead_time <= 162.50 | | | | | | | | | | |--- lead_time <= 161.50 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 161.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- lead_time > 162.50 | | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | | |--- weights: [52.00, 0.00] class: 0.0 | | | | | | | | |--- length_stay > 5.50 | | | | | | | | | |--- avg_price_per_room <= 88.38 | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 88.38 | | | | | | | | | | |--- lead_time <= 176.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- lead_time > 176.00 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 93.44 | | | | | | | | |--- lead_time <= 178.50 | | | | | | | | | |--- avg_price_per_room <= 93.67 | | | | | | | | | | |--- lead_time <= 169.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- lead_time > 169.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 93.67 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | |--- lead_time > 178.50 | | | | | | | | | |--- lead_time <= 179.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | | | | | |--- lead_time > 179.50 | | | | | | | | | | |--- arrival_month <= 9.00 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | | |--- arrival_month > 9.00 | | | | | | | | | | | |--- weights: [4.00, 1.00] class: 0.0 | | | | |--- lead_time > 180.50 | | | | | |--- length_stay <= 3.50 | | | | | | |--- no_of_special_requests_log <= 1.50 | | | | | | | |--- lead_time <= 187.50 | | | | | | | | |--- arrival_month <= 4.00 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- arrival_month > 4.00 | | | | | | | | | |--- avg_price_per_room <= 78.30 | | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 78.30 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- weights: [0.00, 20.00] class: 1.0 | | | | | | | |--- lead_time > 187.50 | | | | | | | | |--- lead_time <= 304.50 | | | | | | | | | |--- avg_price_per_room <= 78.90 | | | | | | | | | | |--- lead_time <= 237.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 237.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- avg_price_per_room > 78.90 | | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | |--- lead_time > 304.50 | | | | | | | | | |--- arrival_month <= 9.00 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 9.00 | | | | | | | | | | |--- weights: [0.00, 17.00] class: 1.0 | | | | | | |--- no_of_special_requests_log > 1.50 | | | | | | | |--- weights: [11.00, 0.00] class: 0.0 | | | | | |--- length_stay > 3.50 | | | | | | |--- length_stay <= 13.50 | | | | | | | |--- no_of_special_requests_log <= 1.50 | | | | | | | | |--- avg_price_per_room <= 68.32 | | | | | | | | | |--- arrival_month <= 11.00 | | | | | | | | | | |--- weights: [13.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 11.00 | | | | | | | | | | |--- lead_time <= 247.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 247.00 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 68.32 | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | |--- avg_price_per_room <= 81.12 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 81.12 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | |--- lead_time <= 292.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- lead_time > 292.50 | | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | | | | | |--- no_of_special_requests_log > 1.50 | | | | | | | | |--- weights: [17.00, 0.00] class: 0.0 | | | | | | |--- length_stay > 13.50 | | | | | | | |--- weights: [0.00, 5.00] class: 1.0 | | | |--- market_segment_type_Offline > 0.50 | | | | |--- lead_time <= 368.00 | | | | | |--- lead_time <= 348.50 | | | | | | |--- no_of_adults <= 2.50 | | | | | | | |--- length_stay <= 7.50 | | | | | | | | |--- lead_time <= 331.00 | | | | | | | | | |--- no_of_special_requests_log <= 1.24 | | | | | | | | | | |--- weights: [137.00, 0.00] class: 0.0 | | | | | | | | | |--- no_of_special_requests_log > 1.24 | | | | | | | | | | |--- length_stay <= 5.50 | | | | | | | | | | | |--- weights: [12.00, 0.00] class: 0.0 | | | | | | | | | | |--- length_stay > 5.50 | | | | | | | | | | | |--- weights: [2.00, 1.00] class: 0.0 | | | | | | | | |--- lead_time > 331.00 | | | | | | | | | |--- lead_time <= 336.50 | | | | | | | | | | |--- weights: [2.00, 1.00] class: 0.0 | | | | | | | | | |--- lead_time > 336.50 | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0.0 | | | | | | | |--- length_stay > 7.50 | | | | | | | | |--- avg_price_per_room <= 80.74 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 80.74 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | |--- no_of_adults > 2.50 | | | | | | | |--- lead_time <= 196.00 | | | | | | | | |--- weights: [7.00, 0.00] class: 0.0 | | | | | | | |--- lead_time > 196.00 | | | | | | | | |--- no_of_special_requests_log <= 1.24 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | |--- no_of_special_requests_log > 1.24 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | |--- lead_time > 348.50 | | | | | | |--- avg_price_per_room <= 58.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | |--- avg_price_per_room > 58.50 | | | | | | | |--- weights: [6.00, 2.00] class: 0.0 | | | | |--- lead_time > 368.00 | | | | | |--- lead_time <= 381.50 | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | |--- lead_time > 381.50 | | | | | | |--- weights: [1.00, 1.00] class: 0.0 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- no_of_special_requests_log <= 1.50 | | | | |--- weights: [0.00, 2108.00] class: 1.0 | | | |--- no_of_special_requests_log > 1.50 | | | | |--- weights: [31.00, 0.00] class: 0.0 | | |--- arrival_month > 11.50 | | | |--- no_of_special_requests_log <= 0.90 | | | | |--- weights: [47.00, 0.00] class: 0.0 | | | |--- no_of_special_requests_log > 0.90 | | | | |--- lead_time <= 289.50 | | | | | |--- no_of_special_requests_log <= 1.24 | | | | | | |--- avg_price_per_room <= 114.59 | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | |--- avg_price_per_room > 114.59 | | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | | | |--- no_of_special_requests_log > 1.24 | | | | | | |--- avg_price_per_room <= 199.65 | | | | | | | |--- avg_price_per_room <= 110.46 | | | | | | | | |--- length_stay <= 5.00 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- length_stay > 5.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | |--- avg_price_per_room > 110.46 | | | | | | | | |--- weights: [7.00, 0.00] class: 0.0 | | | | | | |--- avg_price_per_room > 199.65 | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | |--- lead_time > 289.50 | | | | | |--- weights: [0.00, 7.00] class: 1.0
# checking out what variable are being prioritized by the model.
print (pd.DataFrame(dtree.feature_importances_, columns = ["Imp"], index = x_train.columns).sort_values(by = 'Imp', ascending = False))
Imp lead_time 0.399009 avg_price_per_room 0.204342 market_segment_type_Online 0.092752 arrival_month 0.083932 length_stay 0.074351 no_of_special_requests_log 0.068483 no_of_adults 0.030179 type_of_meal_plan_Not Selected 0.010868 required_car_parking_space 0.007417 room_type_reserved_Room_Type 4 0.007339 no_of_children_log 0.006595 type_of_meal_plan_Meal Plan 2 0.004375 market_segment_type_Offline 0.003333 room_type_reserved_Room_Type 2 0.002304 room_type_reserved_Room_Type 5 0.001582 market_segment_type_Corporate 0.001144 room_type_reserved_Room_Type 6 0.000674 repeated_guest 0.000464 room_type_reserved_Room_Type 7 0.000398 no_of_previous_cancellations_log 0.000323 no_of_previous_bookings_not_canceled_log 0.000137 room_type_reserved_Room_Type 3 0.000000 market_segment_type_Complementary 0.000000 type_of_meal_plan_Meal Plan 3 0.000000
importances = dtree.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(10,10))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='red', align='center')
plt.yticks(range(len(indices)),[the_features[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
Do we need to prune the tree?¶
Yes, we need to do prune the tree
## pre prune the model with max depth hyperparamater
dTree1 = DecisionTreeClassifier(criterion = 'gini', max_depth=3,random_state=1)
dTree1.fit(x_train, y_train)
DecisionTreeClassifier(max_depth=3, random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(max_depth=3, random_state=1)
# another confusion matrix
make_confusion_matrix(dTree1,y_test)
# the accuracy on the pre pruned tree
print("Accuracy on training set : ", dTree1.score(x_train, y_train))
print("Accuracy on test set : ", dTree1.score(x_test, y_test))
# checking the recall with the get_recall_score userdefined function
get_recall_score(dTree1)
Accuracy on training set : 0.7844202898550725 Accuracy on test set : 0.7913259211614444 Recall on training set : 0.7315556618438359 Recall on test set : 0.7385008517887564
with improved closeness in the training and testing we have successfully eliminated most of the noise from the first model(dTree)
Having th accurancy up to 78/79% is also improvement.
Also very close to the recall metric, making this a much better than the first model
# Let's see the pree pruned tree
plt.figure(figsize=(15,10))
tree.plot_tree(dTree1,feature_names=the_features,filled=True,fontsize=9,node_ids=True,class_names=True)
plt.show()
print(tree.export_text(dTree1,feature_names=the_features,show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests_log <= 0.90 | | |--- market_segment_type_Online <= 0.50 | | | |--- weights: [4614.00, 781.00] class: 0.0 | | |--- market_segment_type_Online > 0.50 | | | |--- weights: [2504.00, 2768.00] class: 1.0 | |--- no_of_special_requests_log > 0.90 | | |--- no_of_special_requests_log <= 1.24 | | | |--- weights: [5624.00, 1055.00] class: 0.0 | | |--- no_of_special_requests_log > 1.24 | | | |--- weights: [2919.00, 145.00] class: 0.0 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests_log <= 0.90 | | | |--- weights: [694.00, 1242.00] class: 1.0 | | |--- no_of_special_requests_log > 0.90 | | | |--- weights: [586.00, 249.00] class: 0.0 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- weights: [31.00, 2108.00] class: 1.0 | | |--- arrival_month > 11.50 | | | |--- weights: [57.00, 15.00] class: 0.0
# Looking at the feature importances of this model
importances = dTree1.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(10,10))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [the_features[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
Let's use GridSearch to hyperparameter tune the model
# Choose the type of classifier.
estimator = DecisionTreeClassifier(random_state=1)
# Grid of parameters to choose from
## add from article
parameters = {'max_depth': np.arange(1,10),
'min_samples_leaf': [1, 2, 5, 7, 10,15,20],
'max_leaf_nodes' : [2, 3, 5, 10],
'min_impurity_decrease': [0.001,0.01,0.1]
}
# scoring function used to compare parameter combinations
acc_scorer = metrics.make_scorer(metrics.recall_score)
# Run the grid search
grid_obj = GridSearchCV(estimator, parameters, scoring=acc_scorer,cv=5)
grid_obj = grid_obj.fit(x_train, y_train)
# Set the clf to the best combination of parameters
estimator = grid_obj.best_estimator_
# Fit the best algorithm to the data.
estimator.fit(x_train, y_train)
DecisionTreeClassifier(max_depth=3, max_leaf_nodes=5,
min_impurity_decrease=0.001, random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(max_depth=3, max_leaf_nodes=5,
min_impurity_decrease=0.001, random_state=1)# run the estimator in a confusion matrix
make_confusion_matrix(estimator,y_test)
# The accuracy on the estimator tree.
print("Accuracy on training set : ",estimator.score(x_train, y_train))
print("Accuracy on test set : ",estimator.score(x_test, y_test))
# Check the recall with the get_recall_score user defined function
get_recall_score(dTree1)
Accuracy on training set : 0.7694943289224953 Accuracy on test set : 0.7719378847744188 Recall on training set : 0.7315556618438359 Recall on test set : 0.7385008517887564
plt.figure(figsize=(15,10))
tree.plot_tree(estimator,feature_names=the_features,filled=True,fontsize=9,node_ids=True,class_names=True)
plt.show()
importances = estimator.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12,12))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [the_features[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
Cost Complexity Pruning
clf = DecisionTreeClassifier(random_state=1)
path = clf.cost_complexity_pruning_path(x_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
pd.DataFrame(path)
| ccp_alphas | impurities | |
|---|---|---|
| 0 | 0.000000 | 0.013864 |
| 1 | 0.000000 | 0.013864 |
| 2 | 0.000000 | 0.013864 |
| 3 | 0.000000 | 0.013864 |
| 4 | 0.000000 | 0.013864 |
| ... | ... | ... |
| 1580 | 0.006666 | 0.286897 |
| 1581 | 0.013045 | 0.299942 |
| 1582 | 0.017260 | 0.317202 |
| 1583 | 0.023990 | 0.365183 |
| 1584 | 0.076578 | 0.441761 |
1585 rows × 2 columns
fig, ax = plt.subplots(figsize=(10,5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker='o', drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
plt.show()
#Decisiion Tree calssifier for every alpha
clfs = []
for ccp_alpha in ccp_alphas:
clf = DecisionTreeClassifier(random_state=1, ccp_alpha=ccp_alpha)
clf.fit(x_train, y_train)
clfs.append(clf)
print("Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
clfs[-1].tree_.node_count, ccp_alphas[-1]))
Number of nodes in the last tree is: 1 with ccp_alpha: 0.0765778947737134
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]
node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
fig, ax = plt.subplots(2, 1,figsize=(10,7))
ax[0].plot(ccp_alphas, node_counts, marker='o', drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker='o', drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()
train_scores = [clf.score(x_train, y_train) for clf in clfs]
test_scores = [clf.score(x_test, y_test) for clf in clfs]
fig, ax = plt.subplots(figsize=(10,5))
ax.set_xlabel("alpha")
ax.set_ylabel("accuracy")
ax.set_title("Accuracy vs alpha for training and testing sets")
ax.plot(ccp_alphas, train_scores, marker='o', label="train",
drawstyle="steps-post")
ax.plot(ccp_alphas, test_scores, marker='o', label="test",
drawstyle="steps-post")
ax.legend()
plt.show()
index_best_model = np.argmax(test_scores)
best_model = clfs[index_best_model]
print(best_model)
print('Training accuracy of best model: ',best_model.score(x_train, y_train))
print('Test accuracy of best model: ',best_model.score(x_test, y_test))
DecisionTreeClassifier(ccp_alpha=9.904212140385933e-05, random_state=1) Training accuracy of best model: 0.90544265910523 Test accuracy of best model: 0.8783423688321235
recall_train=[]
for clf in clfs:
pred_train3=clf.predict(x_train)
values_train=metrics.recall_score(y_train,pred_train3)
recall_train.append(values_train)
recall_test=[]
for clf in clfs:
pred_test3=clf.predict(x_test)
values_test=metrics.recall_score(y_test,pred_test3)
recall_test.append(values_test)
fig, ax = plt.subplots(figsize=(15,5))
ax.set_xlabel("alpha")
ax.set_ylabel("Recall")
ax.set_title("Recall vs alpha for training and testing sets")
ax.plot(ccp_alphas, recall_train, marker='o', label="train",
drawstyle="steps-post")
ax.plot(ccp_alphas, recall_test, marker='o', label="test",
drawstyle="steps-post")
ax.legend()
plt.show()
Model Performance Comparison and Conclusions¶
# creating the model where we get highest train and test recall
index_best_model = np.argmax(recall_test)
best_model = clfs[index_best_model]
print(best_model)
DecisionTreeClassifier(ccp_alpha=2.450465588461807e-05, random_state=1)
#another confusion matrix
make_confusion_matrix(best_model,y_test)
# Recall on train and test
get_recall_score(best_model)
Recall on training set : 0.9790744947985173 Recall on test set : 0.7935831913685406
# showing what metrics this model used
importances = best_model.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12,12))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [the_features[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
comparison_frame = pd.DataFrame({'Model':['Initial decision tree model','Decision tree with restricted maximum depth','Decision treee with hyperparameter tuning',
'Decision tree with post-pruning'], 'Train_Recall':[.981,.732,.732,.979], 'Test_Recall':[.792,.739,.739,.794]})
comparison_frame
| Model | Train_Recall | Test_Recall | |
|---|---|---|---|
| 0 | Initial decision tree model | 0.981 | 0.792 |
| 1 | Decision tree with restricted maximum depth | 0.732 | 0.739 |
| 2 | Decision treee with hyperparameter tuning | 0.732 | 0.739 |
| 3 | Decision tree with post-pruning | 0.979 | 0.794 |
Actionable Insights and Recommendations¶
- What profitable policies for cancellations and refunds can the hotel adopt?
- What other recommedations would you suggest to the hotel?
Insights
- Rooms booked in advance of 151 days (5 months) or less were much less likely to cancel the reservation.
- Rooms booked over 151 days were more likely to cancel.
- Price was the determining factor for the cancellations.
- Cancelation was increased if the room was priced over 100.04 Euros.
- Found better deals if the hotels bookeds early.
My Recommendations
Require a nonrefundable deposit on all rooms in advance of over 5 months.
Offer your best room rates before 5 months ahead. After that you may increase your prices slightly to increase your profit.
On your booking with a menu of special requests available.
- VIP
- Room upgrades
- Earning points